Preparing a grammar for predictive parsing
Write a program in Perl that reads a given grammar from an input file and checks whether the
grammar is suitable for building a predictive parser. If it is not, your program should process the
input grammar by:
(a) removing direct left recursion
(b) Performing left factoring, on all the productions (rules) needed.
(c) Then your program should write the modified grammar to an output file.
Your program must adhere to the set of specifications below. Failure to do so will result in lower
grades when marking your solution.
1. The program will take a maximum of two input arguments passed through the command line.
1.1. The first argument is required and represents the name (may include the path) of the
.TXT file containing the input grammar.
1.2. The second argument is optional. If provided, it represents the name (may include the
path) of the .TXT file containing the output (modified) grammar.
1.3. If the input grammar does not require any modification (meaning it is suitable for a
predictive parser), the program should print to the console the following text "Input
grammar OK". In this case, there is no need to generate an output file even if the name was
provided in the command line.
1.4. If the input grammar needs to be modified and no output file was provided in the
command line, your program should write the results into the file
_out.txt
2. The program should read the grammar in the input file. Assume the grammar is written in the
following format:
2.1. The start symbol is always the left-hand side of the first production.
2.2. Each line in the input file represents a production.
2.3. Non-terminal symbols always have their first character written in uppercase. The rest of
the characters could be either lowercase or uppercase.
2.4. Terminal symbols always have all their characters in lowercase.
2.5. The "empty string" symbol (ɛ) is represented as epsilon and is a terminal symbol.
2.6. The left-hand and right-hand sides of each production are delimited by one or more
dashes and the ">" symbol.
For example, ->, --> and ---------> are all valid delimiters.
2.7. The different alternatives of a production are delimited by the | symbol.2.8. The sequence of terminals and non-terminals in the right-hand side of each production
are separated by one or more white spaces. e.g., S --> Term * Term | a + b
2.9. For the sake of simplicity, assume that all alternatives to a non-terminal expansion are
expressed in the right-hand side of its (only) production, as opposed to having the same nonterminal associated with multiple productions in the grammar.
2.10. There is no limit to the number of productions, terminals or non-terminals that can be
specified as part of the grammar definition.
3. If the grammar expressed in the input file does not comply with any of the specifications above,
your program should display the "Wrong format for input grammar" message in the
console followed by a list of the format specification(s) that were violated in the grammar definition.
4. If the grammar complies with the input format in step 2, your program should analyze whether it is
suitable for predictive parsing. If not, the program should generate a modified grammar with no
direct left recursion and having all non-terminals left-factored, if required.
4.1. Check the Syntactic Analysis for the definition of these two pre-processing algorithms.
4.2. Your program is not responsible for removing indirect left recursion but should indicate
its presence with the following warning message: "Warning: Indirect left
recursion detected in non-terminals:" followed by the list of non-terminals
having indirect left recursion.
5. The program should write the modified grammar to the output file as specified in step 1. The
output grammar must obey the format specifications in step 2.
Example 1
Input:
> grammarchecker.pl grammar1.txt
Input file grammar1.txt
E -> E + T | T
T ---> T * F | F
F -> ( E ) | id
Output file : grammar1_out.txt
E -> T EPrime EPrime -> + T EPrime | epsilon
T -> F TPrime TPrime -> * F TPrime | epsilon
F -> ( E ) | id
The E and T productions have been rewritten without direct left recursion.Example 2
Input:
> grammarchecker.pl grammar2.txt
Input file grammar2.txt
E -> E + T | T
t ---> T * F | F
F -> ( E ) | id
Output:
Invalid grammar in input file: grammar2.txt
Terminal 't' found in the left-hand side of a production
Example 3
Input:
> grammarchecker.pl grammar3.txt result.txt
Input file: grammar 3.txt
T -> a P a S
P --> P a S | a b S ---> S b b | S b | b
Output file: result.txt
T -> a P a S
P -> a b PPrime
PPrime -> a S PPrime | epsilon S -> b SPrime SPrime -> b SPrimePrime |
epsilon SPrimePrime -> b SPrime | SPrime
The P and S productions have been rewritten without direct left recursion. The S production has also
been left-factored.
It is OK to use any publicly available pre-built Perl package as an aid to the development of your
solution, as long as the functionality included in that package does not exceed 50% of the tasks to be
carried out during your assignment. For instance, it is OK to use a Perl package that reads a grammar
from the input file or one that eliminates direct left recursion and performs left factoring for a
grammar, but not both.
If your solution includes these package dependencies, please state them clearly in the preamble of
your main Perl file. For example:
# dependency: grammar-reader (available at
http://package.location.com) # dependency-description: a package
that reads a grammar from a text file