µjava compiler

Metadata

creation year: 2017

links: project

tags: µjava compiler java

This repository contain two different projects :

A compiler for the µjava language written in java
Two Coco configuration files, those files will not be explained as their importance and complexity are rather low compared to the compiler.

This project was created during my studies, which means that I will not work more on this but I wish to keep it for future references if needed.

µjava Compiler

A complete documentation with explanations on the implementation of the result of the tests can be found here.

The compiler can be found in the Code/Compiler/ directory. The goal of this software is to be able to read and parse µjava files in order to create bytecode files that can be executed.

The compiler is composed of two main components.

Scanner

The goal of the scanner is to be able to read a file, detect each character written in the file and be able to distinguish them. However this compiler aims to compile μjava code meaning that the scanner has to understand only the syntax of this language therefore should return an error when the scanned text contains any non μjava character.

Furthermore it should be able to differentiate the keywords of the scanned code as such so it needs to translate the content of the scanned file into several tokens. A token is a scanned bit of code, it could be a number or a letter but can also be any identifier like ’if’, ’else’...

The proper functioning of the scanner will be tested with the help of the given java class named TestScanner available in the TestData directory. The purpose of this script is simply to run the scanner on an input file then print some information returned by the scanner.

For each detected token the information will be the following :

The line on which the token has been detected.
The column on which it starts.
The kind of the token.
The value of the token.

The TestScanner script is going to print a formatted version of that information. For example when the scanner detect a semicolon at the line 1 column 1 this is the expected output :

line 1, col 1: ;

As the scanner returns the kind of the token using keywords the test script is able to translate it on the corresponding character. For the previous example the scanner would return ’semicolon’ as the kind of token. Finally when the token is a number or an identifier the value of the token will be printed, for example :

line 2, col 1: number 5
line 3, col 5: ident size

Tests

This part as been tested with three different provided input files :

sample.mj : This file contains a syntactically correct μjava program, the test should return a formatted line for each token of the program as shown before.
Eratos.mj : Again this file is a working μjava program, therefore the wanted output is the same kind as sample.mj.
BuggyScannerInput.mj : This time the file is deliberately incorrect. The first part of the file doesn’t contain any error therefore should be scanned like the two previous files. On the other hand, it’s finished by six errors that need to be smartly handled by the scanner which imply to read the full document without crashing and report the correct number of errors and their positions. The scanner is expected to return four kind of error messages for the BuggyScannerInput.mj :
- ”Overflow” for the illegal number.
- ”Apostrophe missing” for the illegal character constants with only one apostrophe.
- ”Unknown character” for the unknown token.
- ”Invalid character” for the three remaining illegal character constants.

By testing the scanner with the three files we can be confident in the good working state of the scanner. The content of those files are available in the appendices of this document.

Parser

The second part of this compiler is the parser which needs to fulfill two different tasks. The symbol table is going to store and retrieve all declared names and properties of the μjava program being parsed.

The parser should therefore be able to add a new entry in the table anytime the program declares one with it’s properties but also be capable of fetching it whenever the program calls it.

μjava like most programming languages uses scopes for variables, our symbol table should then be able to store the corresponding scope for each stored variable. This will be imperative when the code is going to be executed by the μjava virtual machine.

Finally once the table management is done we want the parser to be able to generate machine instructions, in our case μjava virtual machine bytecodes.

Tests

This part as been tested with three different provided input files :

sample.mj : Here again as this program is correct our parser should be able to generate the content of the file and compile it.
Eratos.mj : Like for sample.mj the parser should be able to parse it and compile it without any error.
BuggyParserInput.mj : Like for the scanner this file is intentionally incorrect and should produce several errors. We also need the parser to be able to read the entire file and don’t stop when encountering an error as we want to inform the user of all errors in only one run. The program contains eight errors that should be returned by the parser. Their are only four different kind of errors and therefore error messages that should be returned :
- ””{” expected” In this case the brace is only an example as we need to handle every missing character of the program.
- ”invalid assignment or call” This occurs when the program uses an assignment or call symbol that is not recognised by the μjava syntax.
- ”invalid expression” Should be returned every time an expression in invalid, like a function call without an attribute.
- ”incompatible type in assignment” Should be returned when the type of the value is not compatible with the type of the variable in an assignment.