[SOLVED] shell XML compiler 2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)

$25

File Name: shell_XML_compiler_2019/10/26_Workshop_10__Recursive_Descent_Parsing:_Computer_Systems_(2000_7081_Combined).zip
File Size: 1007.94 KB

5/5 - (1 vote)

2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
Workshop 10 Recursive Descent Parsing
Recursive Descent Parsing Overview
In this workshop you will write a recursive descent parser for a simple language. To complete the workshop you just need to complete the bodies of the parseXXX() functions in the file parser.cpp. There is one function for each term in the grammar being parsed and the first parse function that will be called is parseProgram().
To simplify the task we have provided a library of useful functions that we use in most programming workshops and assignments. The library implements a tokeniser for the example language, an abstract syntax tree that can be read or written as XML, symbol tables and an output buffer. The library functions that you can use are described in the .h files in the includes sub-directory. The tokeniser functions are described in the tokeniser.h file, the abstract syntax tree functions are described in the abstract-syntax- tree.h file, the symbol tables are described in the symbols.h file and the output buffer is described in the iobuffer.h file. You may not need to use all of this functionality in this workshop.
Participation Marks
Up to 5 participation marks are available for every workshop, 2 for preparation, 1 for attendance and 2 for completing an activity.
Attendance
One participation mark will be awarded if you attend a workshop and your presence is recorded using our practical marker (https://cs.adelaide.edu.au/services/pracmarker/) . Do not click on flag me for marking button until a supervisor is standing next to you and is ready to record your presence. The flag is only to speed up the data entry and is not visible for very long.
Workshop 10 Background Reading
Before attending the workshop you should read the full workshop description, review the startup files provided below and you may wish to re-read chapter 10 of the textbook.
Workshop 10 Preparation Activity
Two participation marks will be awarded for completion of this preparation activity if it is completed at least 10 minutes before the workshop that you attend but not more than 1 week before.
Note: in example commands % is the shells prompt, it is not part of the command.
Note: the web submission system will record 0 marks for completing this activity, the 2 marks are awarded later when your attendance is recorded by the pracmarker but, only if the activity was completed before attending a workshop.
1. Make a new directory in your svn repository to keep your files for this workshop:
https://myuni.adelaide.edu.au/courses/44936/pages/workshop-10-recursive-descent-parsing?module_item_id=1374190 1/6

2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
2. Check out a working copy of your new svn directory.
3. Change to the workshop10 directory:
% cd ~/cs-projects/workshop10
4. Copy the workshop 10 startup files into the workshop10 directory. The startup files are in the zip file attached below.
5. Compile and a skeleton version of the parser program to be implemented in this workshop using the command:
% make notest
The executable program, parser, is compiled using the file parser.cpp and the precompiled library file lib/lib.a. The main function in the parser program calls parser_xml() to constructs simple abstract syntax tree representing the input and then calls ast_print_as_xml() to print an XML representation of the constructed tree.
6. Add the new files to your svn repository:
7. Goto the Web Submission System and make a submission to the Workshop 10 assignment. A successful submission that passes the preparation tests will complete the preparation activity.
Workshop 10 Activity
Two participation marks will be awarded for completing the following activity. This need not be completed during the workshop but no participation marks will be awarded if the activity is not completed by Friday 5pm of week 10.
After completing each part of the following activity, commit your changes to svn:
% svn commit -m workshop10-activity
After each commit, go to the Web Submission System and make a submission to the Workshop 10 assignment. A successful submission that passes any of the parser tests for inputs example1.test, example2.test or example3.test will complete the workshop activity.
Part 1 Complete the Parser
Write a recursive descent parser for the following language.
Term Definition
program ::= declarations statement declarations ::= (var identifier ;)*
% svn co https://version-control.adelaide.edu.au/svn/ /2019/s2/cs/workshop10 ~/cs-projects/
workshop10
% svn add Makefile Makefile-extras bin includes lib lib/lib.a tests parser parser.cpp
% svn commit -m Workshop 10 Startup Files
% svn mkdir -m workshop10 https://version-control.adelaide.edu.au/svn/ /2019/s2/cs/workshop
10
https://myuni.adelaide.edu.au/courses/44936/pages/workshop-10-recursive-descent-parsing?module_item_id=1374190 2/6

2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
statement ::=
whileStatement ::= ifStatement ::= letStatement ::= statementSequence ::= expression ::= condition ::= term ::=
Program Structure
whileStatement | ifStatement | letStatement |
{ statementSequence }
while ( condition ) statement
if ( condition ) statement (else statement)? let identifier = expression ;
statement*
term (op term)?
term relop term
identifier | integer
So that you do not have to worry about how to tokenise the input, we have provided a working tokeniser. The most useful functions, next_token(), mustbe() and have().
next_token() calls the tokeniser and returns the next token in the input. The token kinds can be found in the file includes/tokeniser.h. The functions token_kind(), token_spelling() and token_ivalue() can be used to inspect the contents of a token object. If there are any errors constructing the next token or the end of the input is reached, the returned token kind will be tk_eoi.
mustbe() is used when you know what the last symbol read must be. The function will check whether or not the last symbol read is the expected symbol and will terminate the program with an error if it is not. If no error occurred, it will then read the next symbol from the input but return the matching token.
have() is used when you want to know if a particular symbol has just been read. The function will check whether or not the last symbol read is the expected symbol. If the expected symbol was read, the function returns true, otherwise it returns false.
There are some situations where you want to check if a token is one of a number of possibilities such as a relational operator or the start of one of the statements. To accommodate this, the tokeniser has four token kinds, tk_statement, tk_infix_op, tk_relop and tk_term, that can be used with mustbe() and have(). The mustbe() function will advance the input if it finds matching token but the parser may need to know which token it found. For this reason, the mustbe() function returns the token that it found.
Parsing Functions
In a recursive descent parser we start by writing a function for each term in the programming languages grammar which is responsible for parsing that term. In this parser, each function is also going to return an abstract syntax tree node representing what it just parsed. For example, if parsing the grammar shown above, we would start with the functions:
ast parseProgram() ;
ast parseDeclarations() ; ast parseStatement() ;
ast parseWhileStatement() ; ast parseIfStatement() ;
ast parseLetStatement() ;
https://myuni.adelaide.edu.au/courses/44936/pages/workshop-10-recursive-descent-parsing?module_item_id=1374190 3/6

2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
ast parseStatementSequence() ; ast parseExpression() ;
ast parseCondition() ;
ast parseTerm() ;
To complete this workshop you need to complete the bodies of these parse functions.
The parser_xml() function initialises the tokeniser for you with a call to next_token() and then calls
the parseProgram() function. Example Parse Function
To illustrate how we use the have() and mustbe() functions, here is what the parseStatementSequence() function could look like:
// statementSequence: { statement* }
ast parseStatementSequence()
{
}
vector seq ;
mustbe(tk_lcb) ;
while ( have(tk_statement) )
{
seq.push_back(parseStatement()) ;
mustbe(tk_rcb) ;
}
return create_statements(seq) ;
The statementSequence rule states that a statementSequence must start with a { (tk_lcb), it may be followed by 0 or any number of statements and then finishes with a } (tk_rcb). Therefore our parse function starts with a call of mustbe(tk_lcb) so that if that is not the next token a fatal error is reported. The input will then have moved onto the next token which should be either the start of a statement or a }. There is a group token tk_statement that matches any token that can start a statement so we use a call of have(tk_statement) to tell if the next token starts a statement. While the next token is the start of a statement we go around a while loop parsing statements by
calling parseStatement(). When the next token cannot start a statement, the while loop terminates. If the input is correct then there must be a } to complete the statement sequence. Therefore the last parsing step is to call mustbe(tk_rcb) so that if that is not the next token a fatal error is reported.
This example also shows the approach taken to creating abstract syntax tree nodes. In each parse function we collect the components of a tree node and only create the node as the last step. In this case, we do not know how many times we will call parseStatement() to create statements nodes, so we collect them in a vector which is finally passed to the create_statements() function.
Variable Declarations
As part of writing the parser you must construct tree nodes to represent the variables that encode their name, their type, their segment and their offset within their segment. In the simple language all variables are stored in the local segment and are of type int. This can be achieved by using a symbol table to record which variables have been declared and where the variables are actually stored in the local segment.
https://myuni.adelaide.edu.au/courses/44936/pages/workshop-10-recursive-descent-parsing?module_item_id=1374190 4/6

2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
When a new variable is declared the declare_variable() function is called to add the variable to the symbol table. The variable is allocated the next free memory location in the local segment. If a variable has already been declared an error message is printed and the program will exit. Once the variable has been added to the symbol table, declare_variable() returns a tree node representation of the variable.
When a variable is used in a statement or expression, the lookup_variable() function is called to find out where the variable is stored in the local segment. If the variable has not been declared an error message is printed and the program will exit. If the variable is present in the symbol table, lookup_variable() returns a tree node representation of the variable.
Part 2 Automatic Testing
The completed parsing program reads from standard input and writes an XML representation of abstract syntax tree to standard output. Next week, you will write a code generator that walks one of these trees and produces VM code that implements the original program.
So that you can test your parser, we have provided three example programs in the tests directory in the zip file attached below. The first and third programs should parse without error, whilst the second should fail because it contains more than one statement after the declarations. You can test your parser as follows:
You should expect the initial version of your parser to fail all of the tests. Note that when you run the make command, make will first attempt to compile your program. If the program does not compile the tests will not be run.
You can copy-paste the individual test commands if you want to see the actual output from a test. For example, if your parser is working you could see:
If you want to add more tests of your own, place the test input file in the tests directory with a name ending in .test. Then run the following command to generate the expected test output:
% make test-add
Part 3 Error Handling
If you have completed steps 1 and 2 it is instructive to consider how to report syntax errors. For example, what information goes into the error message, how do you represent the location of the error and do we
% make
student parser against errors files
| Test Input | parser|Expected Test Output |Test Result
Checking cat tests/example1.test | ./parser | diff tests/example1.test.Pxml test failed Checking cat tests/example2.test | ./parser | diff tests/example2.test.Pxml test failed Checking cat tests/example3.test | ./parser | diff tests/example3.test.Pxml test failed
% cat tests/example2.test | ./parser
***** Error:
3: let x = 1 ;
4: let
^
Found token let but expected end of input
https://myuni.adelaide.edu.au/courses/44936/pages/workshop-10-recursive-descent-parsing?module_item_id=1374190 5/6

2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
attempt to continue parsing so we can identify as many syntax errors as possible?
Recovering from a syntax error may not always be possible because the parser may not be able to tell what the programmer was trying to write and therefore how to correct the error. There are number options available. The easiest option is to simply give up completely so the compiler never has to handle more than one error. If the compiler writer wants to look for other errors there are two main approaches. Either pretending an expected symbol was present after all and continuing to parse the program on that basis. Alternatively, assuming that some extra input is present and simply deleting all input until the expected symbol is finally found. Many compilers adopt a combination of both.
The way to implement the reporting and recovery is by writing your own version of the mustbe() function. When the first error is discovered, your mustbe() function simply reports the error and returns. This in effect, simulates inserting the missing symbol into the program and continuing as if the program was correct. The second time an error is discovered by your mustbe() function, the error is reported but, before it returns, your mustbe() function reads and discards tokens until either it finds the expected token or the end of the input.
For this step, write your own mustbe() function so that it can recover from syntax errors as just described. If you now give your parser an incorrect program, what error messages would help you as a programmer work out what your mistake was? You may need to use a different name for your mustbe() function and edit your parser.cpp file to use the new name throughout.
Startup Files
The startup files in the attached zip file should work on most 64-bit Linux systems and on a Mac. Please see the Startup Files for Workshops and Assignments (https://myuni.adelaide.edu.au/courses/44936/pages/startup-files-for-workshops-and-assignments) page for more information.
workshop-parser.zip (https://myuni.adelaide.edu.au/courses/44936/files/5215186/download?wrap=1)
https://myuni.adelaide.edu.au/courses/44936/pages/workshop-10-recursive-descent-parsing?module_item_id=1374190 6/6

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] shell XML compiler 2019/10/26 Workshop 10 Recursive Descent Parsing: Computer Systems (2000_7081 Combined)
$25