Assignment title: Information
Assignment 4 Jack Compiler
Assignment 4 Jack Compiler
Project Description
In this assignment you will complete a variation of project 10 in the nand2tetris course, a detailed
description Nand2Tetris Project 10 is shown below.
SVN Repository
You must create a directory in your svn repository named: //cs/assignment4. This
directory must only contain the following files and directories:
Makefile this is file used by make to compile your submission. A skeleton is attached below.
.cpp files.
.h files.
tests this directory can be used to store any extra files you need for the assignment such as test
input and output files.
Submission and Marking Scheme
This assignment has two assignments in the web submission system named: Assignment 4 Milestone
Submissions and Assignment 4 Final Submissions.
Assignment 4 Milestone Submissions: due 11:55pm Friday of week 11
The marks awarded by the web submission system for the milestone submission contribute up to 20% of
your marks for assignment 4.
Your compiler must be written in C++. Only the implementation of your Tokeniser, and its driver
program, named JackTokeniser will be marked as part of the milestone submission. Any additional
implementation work will be ignored.
The following command must be used to compile your Tokeniser and name its driver program's executable
file, JackTokeniser:
Computer Systems
% make JackTokeniser
An example Makefile is attached below which you should adapt to match the names of your source files.
You must add the names of all the .h files your JackTokeniser program needs to line that starts
TOKENH= and the names of all the .cpp files your JackTokeniser program needs to the line that starts
TOKENCPP=.
Assignment 4 Final Submissions: due 11:55pm Friday of week 12
The marks awarded contribute up to 80% of your marks for assignment 4.
As for all four programming assignments this semester your mark for the final submission will be
calculated as:
min(marks_for_code,marks_for_log_book)
this makes your logbook vital to your assignment mark. Important: the logbook must have entries for all
work in this assignment, including your milestone submissions. A detailed description of what is expected
from your logbook for this assignment is available here.
The marks_for_code are derived from a combination of automatic marking in the websubmission system
and a manual review of your code. Your compiler will be tested using Jack language programs that may or
may not be syntactically correct. The tests will include the syntactically correct examples described below.
The syntax errors will be created by adding, removing or modifying a single token in one of the examples.
The following command will be used to compile your Tokeniser and Parser with their driver programs'
executable files being named, JackTokeniser and JackParser respectively:
% make JackTokeniser JackParser
An example Makefile is attached below which you should adapt to match the names of your source files.
You must add the names of all the .h files your JackParser program needs to line that starts PARSERH=
and the names of all the .cpp files your JackParser program needs to the line that starts PARSERCPP=.
The manual review of your code will look at your coding style, your commenting and program structure.
We expect that all code that you write will be laid out consistently, it will be easy to read, it will have a clear
structure, it will not be a single main function, each function will be prefixed by an appropriate comment
and any key logic or data structures will also be appropriately commented. In most cases the mark
awarded for the manual review will cap the mark awarded by the web submission system but will not
reduce it below 50%. However, if you have a very low mark or possibly zero from the web submission
system your mark may be increased but not above 25%.
Nand2Tetris Project 10: Compiler I Syntax Analysis
Background
Modern compilers, like those of Java and C#, are twotiered: the compiler's frontend translates from the
highlevel language to an intermediate VM language; the compiler's backend translates further from the
VM language to the native code of the host platform. In projects 7 and 8 we built the backend tier of the
Jack Compiler (we called it the VM Translator); we now turn to construct the compiler's frontend. This
construction will span two projects: syntax analysis (this project), and code generation (next project).
Welcome to syntax analysis.
Objective
In this project we build a Syntax Analyser that parses Jack programs according to the Jack grammar,
producing an XML file that captures the program's structure. In the next project, the logic that generates
passive XML output will be morphed into logic that generates VM code.
Contract
Write a Syntax Analyser for the Jack language. Use it to parse all the .jack class files supplied below. For
each input .jack file, your analyser should generate an .xml output file reflecting its grammar. The
generated files should be identical to the supplied comparefiles.
Resources
The relevant reading for this project is Chapter 10. You will need the programming language (you must
use C++) with which you will implement your Syntax Analyser, the Linux command diff, which allows
comparing the output files generated by your analyser to the example output files supplied by us, and a
class to generate the XML output. In order to generate XML in your driver programs, you must use the
myxml class implementation attached below. All the test files necessary for this project are available in
projects/10 on your computer or in the files attached below.
Proposed Implementation
Chapter 10 includes a proposed, languageindependent Syntax Analyser API, which can serve as your
implementation's blueprint. We propose implementing the Syntax Analyser in two stages. First, write and
test a Tokeniser module. Next, write and test a Parser module. Finally, modify the Parser module to report
syntax errors.
Stage I: Tokeniser
A tokeniser is a program that breaks a given textual input into a list of separate tokens. And while it is at it,
it can also classify the tokens into lexical categories. This is the first task that syntax analysers normally
do. Your first task it to implement the Jack tokeniser module specified in chapter 10. When applied to a text
file containing Jack code, the tokeniser should produce a list of tokens, each printed in a separate line
along with its classification: symbol, keyword, identifier, integer constant or string constant. Here is an
example:
Source code (input) Tokeniser output
if ( x < 0 )
{
let state = "negative";
}
if
(
x
<
0
)
{
let
state
=
negative
;
}
Note that in the case of string constants, the tokeniser throws away the doublequote characters. This
behavior is intended, and is part of our tokeniser specification.
Note that five of the symbols that can be used in the Jack language (<, >, ", ', and &) are also used for
XML markup, and thus they cannot appear verbatim as XML data. To solve the problem, we require the
tokeniser to output these tokens as <, >, ", ' and &, respectively. For example, in
order for the text "<" to be displayed properly in a web browser, it should be generated
as "<". This will be handled automatically by the myxml class attached below.
VERY IMPORTANT:
Your JackTokeniser driver program must call use a separate class that can supply tokens one at a
time.
The driver program must not read the input directly.
The driver program must ignore any command line arguments.
All input must be read using cin.
All output must be written using the myxml class attached below.
All error messages must be written using cerr.
If no errors occur, the driver program's main function must return 0.
If any errors are reported, the driver program's main function must return 1 or somewhere else in the
program must call exit(1).
Stage II: Parser
Once you are done implementing the tokeniser for the Jack language, you can proceed implementing a
Jack parser. A parser goes over the tokenised text and emits output indicating that it "understood" the
text's grammatical structure. In order to do so, the parser must include methods that look for canonic
structures in a certain language in our case Jack and then emit these structures in some agreed upon
formalism in our case the XML described in Chapter 10.
In both Project 10 and Project 11, the Jack parser is implemented as a module called CompilationEngine.
We recommend following the compilation engine's API, and writing each one of its methods, making sure
that it emits the correct XML output. We also recommend to first write a compilation engine that handles
any given Jack code except for expressions, and unittesting it as explained below.
VERY IMPORTANT:
Your JackParser driver program must call use a separate class that can supply tokens one at a time.
The driver program must not read the input directly.
The driver program must ignore any command line arguments.
All input must be read using cin.
All output must be written using the myxml class attached below.
All error messages must be written using cerr.
If no errors occur, the driver program's main function must return 0.
If any errors are reported, the driver program's main function must return 1 or somewhere else in the
program it must call exit(1).
Stage III: Syntax Errors
Once you can successfully parse syntactically correct Jack programs, you can add error reporting to your
parser module. The syntax errors to be reported are only those where the parser does not find the
expected token in the input. You do not require a symbol table. When an error is detected, the parser must
output to cerr the following error message, followed by a newline:
Syntax error on line: #, expected: E, found T!
Where # is the line number of the last token read, E is the token that the parser expected to see next and
T is the token that was read. Note: keywords must be reported as the actual keyword and symbols must
be reported as the actual symbol. Here are some example syntax errors, followed by the appropriate error
messages:
Example 1: SE1.Jack and the expected output, SE1.txt, are in the SyntaxErrors file attached below.
7: ...
8: let = 3 ; // missing variable name
9: ...
Syntax error on line: 8, expected: identifier, found: =!
Example 2: SE2.Jack and the expected output, SE2.txt, are in the SyntaxErrors file attached below.
2: Class SE2 // keywords are all lower case, 'class' is keyword, 'Class' is an
identifier
3: {
4: ...
Syntax error on line: 2, expected: class, found: identifier!
Example 3: SE3.Jack and the expected output, SE3.txt, are in the SyntaxErrors file attached below.
9: ...
10: if ( x > 1 ) // forgot the '{' and '}' brackets
11: let x = 2 ; // the error will not be detected until keyword 'l
et' on line 11 has been read
12: ...
Syntax error on line: 11, expected: {, found: let!
The output file cerr is separate from cout so we test the error messages separately from normal output,
that is, we ignore anything written to cout when testing your error handling. When a syntax error is
detected, your parser is free to stop or continue as you see fit but it must either return 1 from its main
function or somewhere else in the program is must call exit(1).
Test Programs
A natural way to test your analyser it is to have it parse some representative Jack programs. We supply
two such test programs: Square Dance and Array Test. The former includes all the syntactic features of the
Jack language except for array processing, which appears in the latter. We also provide a simpler version
of the SquareDance program, as explained below.
Square Dance: A simple interactive game, described in Project 9. The game implementation is organised
in three classes:
Source class file Description (not really relevant to this project) Tokeniser output Parser output
Main.jack Initialises and starts a new "square moving session". MainT.xml Main.xml
Square.jack
Implements an animated square. A Square object has
a screen location and size properties, and methods SquareT.xml Square.xml
for drawing, erasing,moving, and size changing.
SquareGame.jack Runs the show according to the game rules. SquareGameT.xml SquareGame.xml
Note: The three source Jack files comprising the Square Dance game are identical to those stored in the
projects/09/Square directory. For completeness, an identical copy of these files is also available in the
projects/10/Square directory.
Expressionless Square Dance: This is a version of the Square Dance program in which each expression in
the original source code was replaced with a single identifier (some variable name in scope). This version
of the program was designed for one purpose only: unittesting the analyser's ability to parse everything
except for expressions. Note that the replacement of expressions with variables has resulted in a
nonsensical code. Still, the code follows all the Jack grammar rules. The expressionless files have the
same names as those of the original files, but they are located in a separate directory
(projects/10/ExpressionlessSquare).
Array Test: a singleclass Jack program designed to test how the Analyser handles array processing:
Source class file Description (not really relevant to this project) Tokeniser output
Main.jack
Computes the average of a usersupplied sequence
of integers using an array data structure and array
manipulation commands.
MainT.xml
Experimenting with the test programs: If you want, you can compile the supplied SquareDance and
TestArray programs using the supplied Jack compiler, then use the supplied VM Emulator to run the
compiled code. These activities are completely irrelevant to this project. However, they serve to highlight
the fact that the test programs are not just plain text (although this is perhaps the best way to think about
them in the context of this project).
Testing
Tokeniser Testing:
This is how the web submission system will test your JackTokeniser so make sure you test
your JackTokeniser on the University systems before submitting your work.
The testing can be done using the following command:
% ./JackTokeniser < Square.jack | diff w SquareT.xml
This runs your JackTokeniser using the file Square.jack as standard input. Any output from
your JackTokeniser is then read by the diff command and compared with expected output in the
file SquareT.xml.
The w flag given to the diff command causes any differences in white space on a line to be ignored.
Note that the indentation of the XML output is only for readability, web browsers ignore white space.
If you not using the University systems you may find that your JackTokeniser appears to compile and
pass the tests but fails to compile or compiles but fails the tests when run on the Web Submission
System. It is your responsibility to make sure that your JackTokeniser runs on the University systems.
Parser Testing:
This is how the web submission system will test your JackParser so make sure you test
your JackParser on the University systems before submitting your work.
The testing can be done using the following command:
% ./JackParser < Square.jack | diff w Square.xml
This runs your JackParser using the file Square.jack as standard input. Any output from your
JackParser is then read by the diff command and compared with expected output in the file
Square.xml.
The w flag given to the diff command causes any differences in white space on a line to be ignored.
Note that the indentation of the XML output is only for readability, web browsers ignore white space.
If you not using the University systems you may find that your JackParser appears to compile and
pass the tests but fails to compile or compiles but fails the tests when run on the Web Submission
System. It is your responsibility to make sure that your JackParser runs on the University systems.
Sytax Error Testing:
This is how the web submission system will test your JackParser so make sure you test
your JackParser on the University systems before submitting your work.
The testing can be done using the following command(s) if you are using the shell bash:
% (./JackParser > /dev/null) < SE1.jack 2>&1 | diff w SE1.txt
or
% (./JackParser > /dev/null) < SE1.jack 2> testoutput
% diff w testoutput SE1.txt
The testing can be done using the following command(s) if you are using the shell tcsh:
% (./JackParser > /dev/null) < SE1.jack |& diff w SE1.txt
or
% (./JackParser > /dev/null) < SE1.jack >& testoutput
% diff w testoutput SE1.txt
These commands run your JackParser using a file SE1.jack as standard input and throws away
anything written to standard output. Any error messages from your JackParser that are written to
standard error are read by the diff command and compared with expected output in a file SE1.txt.
The w flag given to the diff command causes any differences in white space on a line to be ignored.
If you not using the University systems you may find that your JackParser appears to compile and
pass the tests but fails to compile or compiles but fails the tests when run on the Web Submission
System. It is your responsibility to make sure that your JackParser runs on the University systems.
Makefile
myxml.cpp
myxml.h
SyntaxErrors.zip