µjava compiler

creation year: 2017

This repository contain two different projects :

This project was created during my studies, which means that I will not work more on this but I wish to keep it for future references if needed.

µjava Compiler

A complete documentation with explanations on the implementation of the result of the tests can be found here.

The compiler can be found in the Code/Compiler/ directory. The goal of this software is to be able to read and parse µjava files in order to create bytecode files that can be executed.

The compiler is composed of two main components.


The goal of the scanner is to be able to read a file, detect each character written in the file and be able to distinguish them. However this compiler aims to compile μjava code meaning that the scanner has to understand only the syntax of this language therefore should return an error when the scanned text contains any non μjava character.

Furthermore it should be able to differentiate the keywords of the scanned code as such so it needs to translate the content of the scanned file into several tokens. A token is a scanned bit of code, it could be a number or a letter but can also be any identifier like ’if’, ’else’...

The proper functioning of the scanner will be tested with the help of the given java class named TestScanner available in the TestData directory. The purpose of this script is simply to run the scanner on an input file then print some information returned by the scanner.

For each detected token the information will be the following :

The TestScanner script is going to print a formatted version of that information. For example when the scanner detect a semicolon at the line 1 column 1 this is the expected output :

line 1, col 1: ;

As the scanner returns the kind of the token using keywords the test script is able to translate it on the corresponding character. For the previous example the scanner would return ’semicolon’ as the kind of token. Finally when the token is a number or an identifier the value of the token will be printed, for example :

line 2, col 1: number 5
line 3, col 5: ident size


This part as been tested with three different provided input files :

By testing the scanner with the three files we can be confident in the good working state of the scanner. The content of those files are available in the appendices of this document.


The second part of this compiler is the parser which needs to fulfill two different tasks. The symbol table is going to store and retrieve all declared names and properties of the μjava program being parsed.

The parser should therefore be able to add a new entry in the table anytime the program declares one with it’s properties but also be capable of fetching it whenever the program calls it.

μjava like most programming languages uses scopes for variables, our symbol table should then be able to store the corresponding scope for each stored variable. This will be imperative when the code is going to be executed by the μjava virtual machine.

Finally once the table management is done we want the parser to be able to generate machine instructions, in our case μjava virtual machine bytecodes.


This part as been tested with three different provided input files :