Assignment title: Information


Assignment 4 ­ Jack Compiler Assignment 4 ­ Jack Compiler Project Description In this assignment you will complete a variation of project 10 in the nand2tetris course, a detailed description Nand2Tetris Project 10 is shown below. SVN Repository You must create a directory in your svn repository named: //cs/assignment4. This directory must only contain the following files and directories: Makefile ­ this is file used by make to compile your submission. A skeleton is attached below. .cpp files. .h files. tests ­ this directory can be used to store any extra files you need for the assignment such as test input and output files. Submission and Marking Scheme This assignment has two assignments in the web submission system named: Assignment 4 ­ Milestone Submissions and Assignment 4 ­ Final Submissions. Assignment 4 ­ Milestone Submissions: due 11:55pm Friday of week 11 The marks awarded by the web submission system for the milestone submission contribute up to 20% of your marks for assignment 4. Your compiler must be written in C++. Only the implementation of your Tokeniser, and its driver program, named JackTokeniser will be marked as part of the milestone submission. Any additional implementation work will be ignored. The following command must be used to compile your Tokeniser and name its driver program's executable file, JackTokeniser: Computer Systems % make JackTokeniser An example Makefile is attached below which you should adapt to match the names of your source files. You must add the names of all the .h files your JackTokeniser program needs to line that starts TOKENH= and the names of all the .cpp files your JackTokeniser program needs to the line that starts TOKENCPP=. Assignment 4 ­ Final Submissions: due 11:55pm Friday of week 12 The marks awarded contribute up to 80% of your marks for assignment 4. As for all four programming assignments this semester your mark for the final submission will be calculated as: min(marks_for_code,marks_for_log_book) this makes your log­book vital to your assignment mark. Important: the log­book must have entries for all work in this assignment, including your milestone submissions. A detailed description of what is expected from your log­book for this assignment is available here. The marks_for_code are derived from a combination of automatic marking in the web­submission system and a manual review of your code. Your compiler will be tested using Jack language programs that may or may not be syntactically correct. The tests will include the syntactically correct examples described below. The syntax errors will be created by adding, removing or modifying a single token in one of the examples. The following command will be used to compile your Tokeniser and Parser with their driver programs' executable files being named, JackTokeniser and JackParser respectively: % make JackTokeniser JackParser An example Makefile is attached below which you should adapt to match the names of your source files. You must add the names of all the .h files your JackParser program needs to line that starts PARSERH= and the names of all the .cpp files your JackParser program needs to the line that starts PARSERCPP=. The manual review of your code will look at your coding style, your commenting and program structure. We expect that all code that you write will be laid out consistently, it will be easy to read, it will have a clear structure, it will not be a single main function, each function will be prefixed by an appropriate comment and any key logic or data structures will also be appropriately commented. In most cases the mark awarded for the manual review will cap the mark awarded by the web submission system but will not reduce it below 50%. However, if you have a very low mark or possibly zero from the web submission system your mark may be increased but not above 25%. Nand2Tetris Project 10: Compiler I ­ Syntax Analysis Background Modern compilers, like those of Java and C#, are two­tiered: the compiler's front­end translates from the high­level language to an intermediate VM language; the compiler's back­end translates further from the VM language to the native code of the host platform. In projects 7 and 8 we built the back­end tier of the Jack Compiler (we called it the VM Translator); we now turn to construct the compiler's front­end. This construction will span two projects: syntax analysis (this project), and code generation (next project). Welcome to syntax analysis. Objective In this project we build a Syntax Analyser that parses Jack programs according to the Jack grammar, producing an XML file that captures the program's structure. In the next project, the logic that generates passive XML output will be morphed into logic that generates VM code. Contract Write a Syntax Analyser for the Jack language. Use it to parse all the .jack class files supplied below. For each input .jack file, your analyser should generate an .xml output file reflecting its grammar. The generated files should be identical to the supplied compare­files. Resources The relevant reading for this project is Chapter 10. You will need the programming language (you must use C++) with which you will implement your Syntax Analyser, the Linux command diff, which allows comparing the output files generated by your analyser to the example output files supplied by us, and a class to generate the XML output. In order to generate XML in your driver programs, you must use the myxml class implementation attached below. All the test files necessary for this project are available in projects/10 on your computer or in the files attached below. Proposed Implementation Chapter 10 includes a proposed, language­independent Syntax Analyser API, which can serve as your implementation's blueprint. We propose implementing the Syntax Analyser in two stages. First, write and test a Tokeniser module. Next, write and test a Parser module. Finally, modify the Parser module to report syntax errors. Stage I: Tokeniser A tokeniser is a program that breaks a given textual input into a list of separate tokens. And while it is at it, it can also classify the tokens into lexical categories. This is the first task that syntax analysers normally do. Your first task it to implement the Jack tokeniser module specified in chapter 10. When applied to a text file containing Jack code, the tokeniser should produce a list of tokens, each printed in a separate line along with its classification: symbol, keyword, identifier, integer constant or string constant. Here is an example: Source code (input) Tokeniser output if ( x < 0 ) { let state = "negative"; } if ( x < 0 ) { let state = negative ; } Note that in the case of string constants, the tokeniser throws away the double­quote characters. This behavior is intended, and is part of our tokeniser specification. Note that five of the symbols that can be used in the Jack language (<, >, ", ', and &) are also used for XML markup, and thus they cannot appear verbatim as XML data. To solve the problem, we require the tokeniser to output these tokens as <, >, ", ' and &, respectively. For example, in order for the text "<" to be displayed properly in a web browser, it should be generated as "<". This will be handled automatically by the myxml class attached below. VERY IMPORTANT: Your JackTokeniser driver program must call use a separate class that can supply tokens one at a time. The driver program must not read the input directly. The driver program must ignore any command line arguments. All input must be read using cin. All output must be written using the myxml class attached below. All error messages must be written using cerr. If no errors occur, the driver program's main function must return 0. If any errors are reported, the driver program's main function must return ­1 or somewhere else in the program must call exit(­1). Stage II: Parser Once you are done implementing the tokeniser for the Jack language, you can proceed implementing a Jack parser. A parser goes over the tokenised text and emits output indicating that it "understood" the text's grammatical structure. In order to do so, the parser must include methods that look for canonic structures in a certain language ­ in our case Jack ­ and then emit these structures in some agreed upon formalism ­ in our case the XML described in Chapter 10. In both Project 10 and Project 11, the Jack parser is implemented as a module called CompilationEngine. We recommend following the compilation engine's API, and writing each one of its methods, making sure that it emits the correct XML output. We also recommend to first write a compilation engine that handles any given Jack code except for expressions, and unit­testing it as explained below. VERY IMPORTANT: Your JackParser driver program must call use a separate class that can supply tokens one at a time. The driver program must not read the input directly. The driver program must ignore any command line arguments. All input must be read using cin. All output must be written using the myxml class attached below. All error messages must be written using cerr. If no errors occur, the driver program's main function must return 0. If any errors are reported, the driver program's main function must return ­1 or somewhere else in the program it must call exit(­1). Stage III: Syntax Errors Once you can successfully parse syntactically correct Jack programs, you can add error reporting to your parser module. The syntax errors to be reported are only those where the parser does not find the expected token in the input. You do not require a symbol table. When an error is detected, the parser must output to cerr the following error message, followed by a newline: Syntax error on line: #, expected: E, found T! Where # is the line number of the last token read, E is the token that the parser expected to see next and T is the token that was read. Note: keywords must be reported as the actual keyword and symbols must be reported as the actual symbol. Here are some example syntax errors, followed by the appropriate error messages: Example 1: SE1.Jack and the expected output, SE1.txt, are in the SyntaxErrors file attached below. 7: ... 8: let = 3 ; // missing variable name 9: ... Syntax error on line: 8, expected: identifier, found: =! Example 2: SE2.Jack and the expected output, SE2.txt, are in the SyntaxErrors file attached below. 2: Class SE2 // keywords are all lower case, 'class' is keyword, 'Class' is an identifier 3: { 4: ... Syntax error on line: 2, expected: class, found: identifier! Example 3: SE3.Jack and the expected output, SE3.txt, are in the SyntaxErrors file attached below. 9: ... 10: if ( x > 1 ) // forgot the '{' and '}' brackets 11: let x = 2 ; // the error will not be detected until keyword 'l et' on line 11 has been read 12: ... Syntax error on line: 11, expected: {, found: let! The output file cerr is separate from cout so we test the error messages separately from normal output, that is, we ignore anything written to cout when testing your error handling. When a syntax error is detected, your parser is free to stop or continue as you see fit but it must either return ­1 from its main function or somewhere else in the program is must call exit(­1). Test Programs A natural way to test your analyser it is to have it parse some representative Jack programs. We supply two such test programs: Square Dance and Array Test. The former includes all the syntactic features of the Jack language except for array processing, which appears in the latter. We also provide a simpler version of the SquareDance program, as explained below. Square Dance: A simple interactive game, described in Project 9. The game implementation is organised in three classes: Source class file Description (not really relevant to this project) Tokeniser output Parser output Main.jack Initialises and starts a new "square moving session". MainT.xml Main.xml Square.jack Implements an animated square. A Square object has a screen location and size properties, and methods SquareT.xml Square.xml for drawing, erasing,moving, and size changing. SquareGame.jack Runs the show according to the game rules. SquareGameT.xml SquareGame.xml Note: The three source Jack files comprising the Square Dance game are identical to those stored in the projects/09/Square directory. For completeness, an identical copy of these files is also available in the projects/10/Square directory. Expressionless Square Dance: This is a version of the Square Dance program in which each expression in the original source code was replaced with a single identifier (some variable name in scope). This version of the program was designed for one purpose only: unit­testing the analyser's ability to parse everything except for expressions. Note that the replacement of expressions with variables has resulted in a nonsensical code. Still, the code follows all the Jack grammar rules. The expressionless files have the same names as those of the original files, but they are located in a separate directory (projects/10/ExpressionlessSquare). Array Test: a single­class Jack program designed to test how the Analyser handles array processing: Source class file Description (not really relevant to this project) Tokeniser output Main.jack Computes the average of a user­supplied sequence of integers using an array data structure and array manipulation commands. MainT.xml Experimenting with the test programs: If you want, you can compile the supplied SquareDance and TestArray programs using the supplied Jack compiler, then use the supplied VM Emulator to run the compiled code. These activities are completely irrelevant to this project. However, they serve to highlight the fact that the test programs are not just plain text (although this is perhaps the best way to think about them in the context of this project). Testing Tokeniser Testing: This is how the web submission system will test your JackTokeniser so make sure you test your JackTokeniser on the University systems before submitting your work. The testing can be done using the following command: % ./JackTokeniser < Square.jack | diff ­w ­ SquareT.xml This runs your JackTokeniser using the file Square.jack as standard input. Any output from your JackTokeniser is then read by the diff command and compared with expected output in the file SquareT.xml. The ­w flag given to the diff command causes any differences in white space on a line to be ignored. Note that the indentation of the XML output is only for readability, web browsers ignore white space. If you not using the University systems you may find that your JackTokeniser appears to compile and pass the tests but fails to compile or compiles but fails the tests when run on the Web Submission System. It is your responsibility to make sure that your JackTokeniser runs on the University systems. Parser Testing: This is how the web submission system will test your JackParser so make sure you test your JackParser on the University systems before submitting your work. The testing can be done using the following command: % ./JackParser < Square.jack | diff ­w ­ Square.xml This runs your JackParser using the file Square.jack as standard input. Any output from your JackParser is then read by the diff command and compared with expected output in the file Square.xml. The ­w flag given to the diff command causes any differences in white space on a line to be ignored. Note that the indentation of the XML output is only for readability, web browsers ignore white space. If you not using the University systems you may find that your JackParser appears to compile and pass the tests but fails to compile or compiles but fails the tests when run on the Web Submission System. It is your responsibility to make sure that your JackParser runs on the University systems. Sytax Error Testing: This is how the web submission system will test your JackParser so make sure you test your JackParser on the University systems before submitting your work. The testing can be done using the following command(s) if you are using the shell bash: % (./JackParser > /dev/null) < SE1.jack 2>&1 | diff ­w ­ SE1.txt or % (./JackParser > /dev/null) < SE1.jack 2> testoutput % diff ­w testoutput SE1.txt The testing can be done using the following command(s) if you are using the shell tcsh: % (./JackParser > /dev/null) < SE1.jack |& diff ­w ­ SE1.txt or % (./JackParser > /dev/null) < SE1.jack >& testoutput % diff ­w testoutput SE1.txt These commands run your JackParser using a file SE1.jack as standard input and throws away anything written to standard output. Any error messages from your JackParser that are written to standard error are read by the diff command and compared with expected output in a file SE1.txt. The ­w flag given to the diff command causes any differences in white space on a line to be ignored. If you not using the University systems you may find that your JackParser appears to compile and pass the tests but fails to compile or compiles but fails the tests when run on the Web Submission System. It is your responsibility to make sure that your JackParser runs on the University systems. Makefile myxml.cpp myxml.h SyntaxErrors.zip