Assignment title: Information
Principles of Programming Languages winter 2017
Assignment 6 - Programming Part Due: 11 April 2017
This is the final programming assignment, in which you will build a symbol table and an
interpreter. It is due on the last day of classes.
A brief overview of the interpreting process you will complete building with this assignment:
(1) First you will need the symbol table:
build a library of predicates for managing a symbol table. The symbol table will
be built using association lists and stored in global memory space. Write the
symbol table in Prolog, using predicates (no grammars) and save as a file named
symbol table.pl. The file should contain the following predicates:
• create empty table/0: Creates an empty association list and stores it in a
global variable for use as a symbol table.
• initialize functions/1: Accepts a list of functions as the argument and
stores them in the symbol table.
• add symbol/2: Accepts a key, value pair as arguments and adds a symbol to
the table in the form: key ! [value|Rest].
• add symbol list/2 Accepts a list of keys as the first argument and a list of
values as the second argument and adds every (key, value) pair to the symbol
table as required by add symbol/2
• remove symbol/1: Accepts a key as an argument and removes the most recently added value for that key such that the symbol table is changed from
key ! [value|Rest] to key ! Rest. get symbol/2: This predicate accepts
a key as the first argument and unifies a value, as the second argument, to
the first value associated with key in the symbol table. The symbol table entry
should be structured: key ! [valuejRest]
12
(2) Finally, the Interpreter:
The source file is a program, with a single function.
int main (int input ) = input + 3
This is input into the Tokenizer, which outputs a list of tokens, stored as:
[TYPE INT,IDENTIFIER,OPEN P,TYPE INT,IDENTIFIER,CLOSE P,ASSIGN,IDENTIFIER,ARITH ADD,INTEGER]
This label list is used as the input to the parser (the grammar). Since the source
tokens are converted into labels, it is now easy to check if the token list is grammatically correct i.e. the tokens are in the right places). The parser matches tokens in
a left-to-right recursive manner. As input reduces to known terminal labels, Prolog
unifies and produces an output list. If you designed your grammar?s output to
have the format suggested in the assignments and tutorials, the structured list your
parser outputs might look something like:
[ [ [ int,id ],(,[ [ int,id], [ ] ],),=,[ [ id,[ ] ],[ +,num] ] ] ]
The above list represents a list of functions, and each list within a function represents a production. Seeing which list corresponds to which production should be
fairly clear. In the structured list provided by the parser, instead of variable/function
names and numerical values, the list still contains the placeholders id and num. This
means you have to refer back to your original code and copy the original values into
the current output.
Since the values stored in the original token list are known, they can be retrieved
from there and place them in the appropriate places in the structured list.
Some observations about this process:
• The number of atoms in our structured list is equal to the number of elements
in the token list: The parser does not add or remove tokens, it just verifies that
the tokens are in the correct positions.
• The left-to-right order of the tokens in the token list is the same as the left-toright order of the atoms in the structured list: The parser does not change any
token positions.
• If you created a structured list using the tokens from the language specification
(so that the structured list tokens match the source file tokens) then you can
copy every token from the token list into the structured list as you find them.
For example, if the first atom you find in the structured list, starting from the
left is int, then the first element of the token list is also an int. This means
that instead of checking for the id and num tokens, you can simply copy every
token as its found and the structured list will be correct.3
• As you replace terms from left-to-right you will have to recursively move into
nested lists. This means you will need to pass your token list downward through
every recursion and whenever you reach the end of a list you must return the
remaining tokens to the level above. If you do not do this, Prolog will just
continue using whatever the current token list is at higher levels, even if you
have removed tokens from the current token list at the lower levels.
Once you have an algorithm for infusing token list values into your structured list,
you should have a new structured list with the correct names and values inserted,
such as:
[ [ [ int,main ],(,[ [ int, input ], [ ] ],),=,[ [ input, [ ] ],[ +,3] ] ] ]
Since this is known to be a list of functions, you can use your symbol table predicates to move each function (in this case just one) into memory (See Tutorial 9 for
details). Once this is done you can run the interpreter and obtain results.
If the previous components are properly constructed, according to the specifications, the interpreter is quite straight forward to build. Once the functions are stored
in the symbol table, the interpreter will accept a list of integers as starting parameters and use them to call the main predicate. The body of the main function
will then be executed and it will perform a series of actions which reduces the body
to a single integer value, which is then returned as the result. The interpreter
should then verify that a proper return value is found and then return it as the final
result.
When constructing the interpreter, the following language rules apply:
• Every source file must contain at least once function and one of those functions
must be called main with a return type of int. The main function may contain
any number of parameters.
• The body of each function will be calculated until a single value is retrieved.
This is the result for the function body. Once a result is reached, you must
verify that it matches the return type of the function. A function can return
either a bool or int as an integer value. If the return type is bool, the result
must be 0 (false) or 1 (true). If the return type is int, the value may be any
integer.
Steps for calling a function:
(a) You must accept a list of integers as parameters (value list) and the name of a
function.
(b) Retrieve the function?s data from the symbol table (return value, parameter
list, function body).4
(c) Assume the value list elements will be in the same order as the parameter list
entries. The parameter list entries should be in the form of [Type, Name]. You
will need to: (i) Verify that for each value in the value list, the integer value is
of the same type (bool or int) as the corresponding parameter type from the
function data.
(d) If the types match, store the value in the symbol table as a (Name ! Value)
pair.
(e) Once the expression produces a value, ensure its type matches the function's
return type.
(f) If the types match, remove each parameter in the parameter list from the
symbol table.
(g) Return the result of expression.
When constructing the interpreter's main execution logic, note that the body
of a function is an Expression (from the language grammar), which is stored as a
list with a specific number of elements. For example, examine the main function
body, used above:
[ [ input, [ ] ],[ +,3] ] ] ]
For this particular body, use the production: Expression ! Value Extra
Expression. Observe that for this particular production is a list of two elements,
a predicate that handles an expression of two components and produces a result is
needed:
expressionHandler( [ Value, ExtraExpression ], Result ) :-
valueHandler( Value, ValueResult ),
extraExpressionHandler( ExtraExpression, ExtraResult ),
evaluate( ValueResult, ExtraResult, Result ).
Since a predicate can be written for each production, the valueHandler will take
in a Value production list and break that down to an integer (a value is either
a number or an identifier, which is a number stored in a symbol table). An
ExtraExpression can be an arithmetic function or nothing. With these in hand,
you can evaluate the outcome of all of our components to produce a result. For this,
the evaluate clause needed to handle the code would be:
evaluate( Value1, [ '+', Value2 ], Result ) :-
Result is Value1 + Value2.
Writing the interpreter in this fashion means that every predicate is a black box
that always returns a result. If you are in expressionHandler and you know what
kind of result the decomposition predicates will produce, you should be able to handle the values and produce results with little trouble.5
The deliverables for this assignment:
(a) The symbol table, saved as a file named symbol table.pl.
(b) Create a file named interpreter.pl to contain all the function calling and
code execution predicates. The predicates may be named however you choose
and can accept any number of parameters. Ensure your interpreter follows the
requirements outlined on pages 2-4 of this document. Any other additional
work is at your own discretion.
(c) Finally, create a language executor.pl file which includes a run program/3
predicate as follows:
run program( FileName, Arguments, Result )
The first argument accepts an atom as a file-name to a source code file, the
second argument is a list of integers to be passed to the main function, and the
third argument should be the result of the code execution in the interpreter.
This predicate should be the main logic of the pipeline as outlined in the figure
on page 1.Via this predicate, tokenize the source file, move the tokens to the
lexer, move the label list to the parser, etc., to finally unify Result to the
result of the interpreted code provided by the file and the arguments.
Note: When you submit your code, be sure to include ALL files required for
execution
(tokenizer, lexer, etc.)..