Assignment title: Information


Preparing a grammar for predictive parsing Write a program in Perl that reads a given grammar from an input file and checks whether the grammar is suitable for building a predictive parser. If it is not, your program should process the input grammar by: (a) removing direct left recursion (b) Performing left factoring, on all the productions (rules) needed. (c) Then your program should write the modified grammar to an output file. Your program must adhere to the set of specifications below. Failure to do so will result in lower grades when marking your solution. 1. The program will take a maximum of two input arguments passed through the command line. 1.1. The first argument is required and represents the name (may include the path) of the .TXT file containing the input grammar. 1.2. The second argument is optional. If provided, it represents the name (may include the path) of the .TXT file containing the output (modified) grammar. 1.3. If the input grammar does not require any modification (meaning it is suitable for a predictive parser), the program should print to the console the following text "Input grammar OK". In this case, there is no need to generate an output file even if the name was provided in the command line. 1.4. If the input grammar needs to be modified and no output file was provided in the command line, your program should write the results into the file _out.txt 2. The program should read the grammar in the input file. Assume the grammar is written in the following format: 2.1. The start symbol is always the left-hand side of the first production. 2.2. Each line in the input file represents a production. 2.3. Non-terminal symbols always have their first character written in uppercase. The rest of the characters could be either lowercase or uppercase. 2.4. Terminal symbols always have all their characters in lowercase. 2.5. The "empty string" symbol (ɛ) is represented as epsilon and is a terminal symbol. 2.6. The left-hand and right-hand sides of each production are delimited by one or more dashes and the ">" symbol. For example, ->, --> and ---------> are all valid delimiters. 2.7. The different alternatives of a production are delimited by the | symbol.2.8. The sequence of terminals and non-terminals in the right-hand side of each production are separated by one or more white spaces. e.g., S --> Term * Term | a + b 2.9. For the sake of simplicity, assume that all alternatives to a non-terminal expansion are expressed in the right-hand side of its (only) production, as opposed to having the same nonterminal associated with multiple productions in the grammar. 2.10. There is no limit to the number of productions, terminals or non-terminals that can be specified as part of the grammar definition. 3. If the grammar expressed in the input file does not comply with any of the specifications above, your program should display the "Wrong format for input grammar" message in the console followed by a list of the format specification(s) that were violated in the grammar definition. 4. If the grammar complies with the input format in step 2, your program should analyze whether it is suitable for predictive parsing. If not, the program should generate a modified grammar with no direct left recursion and having all non-terminals left-factored, if required. 4.1. Check the Syntactic Analysis for the definition of these two pre-processing algorithms. 4.2. Your program is not responsible for removing indirect left recursion but should indicate its presence with the following warning message: "Warning: Indirect left recursion detected in non-terminals:" followed by the list of non-terminals having indirect left recursion. 5. The program should write the modified grammar to the output file as specified in step 1. The output grammar must obey the format specifications in step 2. Example 1 Input: > grammarchecker.pl grammar1.txt Input file grammar1.txt E -> E + T | T T ---> T * F | F F -> ( E ) | id Output file : grammar1_out.txt E -> T EPrime EPrime -> + T EPrime | epsilon T -> F TPrime TPrime -> * F TPrime | epsilon F -> ( E ) | id The E and T productions have been rewritten without direct left recursion.Example 2 Input: > grammarchecker.pl grammar2.txt Input file grammar2.txt E -> E + T | T t ---> T * F | F F -> ( E ) | id Output: Invalid grammar in input file: grammar2.txt Terminal 't' found in the left-hand side of a production Example 3 Input: > grammarchecker.pl grammar3.txt result.txt Input file: grammar 3.txt T -> a P a S P --> P a S | a b S ---> S b b | S b | b Output file: result.txt T -> a P a S P -> a b PPrime PPrime -> a S PPrime | epsilon S -> b SPrime SPrime -> b SPrimePrime | epsilon SPrimePrime -> b SPrime | SPrime The P and S productions have been rewritten without direct left recursion. The S production has also been left-factored. It is OK to use any publicly available pre-built Perl package as an aid to the development of your solution, as long as the functionality included in that package does not exceed 50% of the tasks to be carried out during your assignment. For instance, it is OK to use a Perl package that reads a grammar from the input file or one that eliminates direct left recursion and performs left factoring for a grammar, but not both. If your solution includes these package dependencies, please state them clearly in the preamble of your main Perl file. For example: # dependency: grammar-reader (available at http://package.location.com) # dependency-description: a package that reads a grammar from a text file