5/5 - (1 vote)

Project Step 3 Symbol Table

Due: September 29th

Your goal in this step is to process variable declarations and create Symbol Tables. A symbol table is a data structure that keeps information about non-keyword symbols that appear in source programs. Variables and function names are examples of such symbols. The symbols added to the symbol table will be used in many of the further phases of the compilation.

In Step 2 you didnt need token values since only token types are used by parser generator tools to guide parsing. But in this step your parser needs to get token values such as identifier names and string literals from your scanner. You also need to add semantic actions to create symbol table entries and add those to the symbol table.

Semantic Actions

Semantic actions are steps that your compiler takes as the parser recognizes constructs in your program. Another way to think about this is that semantic actions are code that executes as your compiler matches various parts of the program (constructs like variable declarations or tokens like identifiers ). By taking the right kind of action when the right kind of construct is recognized, you can make your compiler do useful work!

One way to think about when to take actions in your parser is to think about associating actions with nodes of your parse tree. Each element in the parse tree can take an action to generate a semantic record for that node in the parse tree; these actions can include looking at the semantic records generated for the children in the parse tree. For example, consider the parse tree for the following piece of code:

STRING foo := "Hello World";

Which produces the (partial) parse tree below:

Hello World parse tree

We can create semantic records for each of the tokens IDENTIFIER and STRINGLITERAL that record their values (foo and Hello World, respectively), and pass them up the tree so that those records are the records for id and str . We can then construct a semantic record for string_decl using the semantic records of its children to produce a structure that captures the necessary information for a string declaration entry in your symbol table (and even add the entry to the symbol table).

Parser-driven Semantic Actions

It may seem like a lot of work to keep track of pieces of data at each node of the parse tree and assemble them into more complicated records. But most parsers help do this automatically . As they build up the parse tree, they will call actions that execute to collect data from the parse tree and create semantic records. In essence, the parsers perform a post-order traversal of the parse tree as they walk the tree , and store information about the necessary semantic records in a way that you can easily retrieve them. Lets look at how this works for both ANTLR and bison.

For both of these, it is worth remembering two things:

Tokens become leaves in the parse tree. The semantic record for a token is either always the text associated with that token (if youre using ANTLR) or whatever you assign yylvalto in your scanner (if youre using flex/bison).
Everysymbol that shows up in a grammar rule will be a node in your parse tree, and that if you recognize a grammar rule, there will be a node in your parse tree associated with the left-hand-side of the rule and that node will have a separate child for each of the symbols that appear on the right hand side.

ANTLR

In ANTLR, you can put arbitrary Java code in braces ( {} ) at various points in the right hand side of a grammar rule in your .g4 file; this code will execute as soon as the symbol immediately before the braces has been matched (if its a token) or predicted and matched (if its a non-terminal).

The main semantic action for a rule will go at the very end of the rule (in other words, it will execute once the entire rule has been matched). As part of this rule, you can assign a value to the semantic record that will be associated with the left hand side of the rule. You name the semantic record and tell ANTLR what type that record should have using a returns annotation.

You can then extract information from the semantic records of the children by giving variable names to the children you have matched. You can either access that childs semantic record by accessing $a.x where a is the name you gave the child and x is the name you gave to the childs semantic record in the returns annotation, or you can access the characters associated with the child (useful when matching tokens) with $a.text .

So suppose we wanted to return a structure called StrEntry from our string_decl rule. It might look something like this:

string_decl [returns StrEntry s] : STRING id ASSIGN_OP ex=str SEMI{$s = new StrEntry(); s.addID($id.text); s.addValue($ex.text);}

(Note that if you dont give a child a name (like id in the above example), ANTLR defaults to using the name of the token or non-terminal. If the same symbol shows up twice in a rule, you must give them names.)

Bison

In bison, you can put arbitrary C (or C++) code in braces at various points in the right hand side of a grammar rule in your .y file. It works very similarly to ANTLR in that respect. The key different in bison is understanding how to pass data around.

To set the type of a semantic record for a non-terminal, you use a %type command:

%type <s_entry> string_decl%type <s> id str

All of the types that you want to use need to be part of a union that determines the possible types for yylval , which you declare in a %union declaration:

%union{StrEntry * s_entry;string * s;}

So the %type commands and %union command in conjunction mean that the string_decl non-terminal will produce a semantic record named s_entry of type StrEntry * and the non-terminals id and str will produce semantic records of type string * named s .

(You can also do this by making yylval a struct instead of a union if you would rather do that, please discuss with the TA).

Then, when building a semantic action for a rule, $$ refers to the semantic record you are building for the left hand side (whose type is determined by the %type command), and $1 , $2 , etc. refer to the semantic records for the right hand side, listed in order . So here is the equivalent action for creating a StrEntry object for the str_decl rule:

string_decl : STRING id ASSIGN_OP str SEMI {$$ = new StrEntry(); $$->addID($2); $$->addValue($4);};

Symbol Tables

Your task in this step of the project is to construct symbol tables for each scope in your program. For each scope, construct a symbol table, then add entries to that symbol table as you see declarations. The declarations you have to handle are integer/float declarations, which should record the name and type of the variable, and string declarations, which should additionally record the value of the string. Note that typically function declarations/definitions would result in entries in the symbol table, too, but you do not have to record them for this step.

Nested Symbol Tables

In this years variant of Micro, there are multiple scopes where variables can be declared:

Variables can be declared before any functions. These are global variables, and can be accessed from any function.
Variables can be declared as part of a functions parameter list. These are local to the function, and cannot be accessed by any other function.
Variables can be declared at the beginning of a function body. These are local to the function as well.
Variables can be declared at the beginning of a then block, an else block, or a repeat statement. These are local to the block itself. Other blocks, even in the same function, cannot access these variables.

Note that the scopes in the program are nested (function scopes are inside global scopes, and block scopes are nested inside function scopes, or each other). You will have to keep track of this nesting so that when a piece of code uses a variable named x you know which scope that variable is from.

To handle this, we suggest building a tree of symbol tables, where each symbol table, in addition to keeping track of the symbols in the current scope, keeps track of an ordered list of its immediately nested scopes (so the global scope symbol table will also have a list of all the function scope symbol tables).

To keep track of the nesting during parsing, we suggest maintaining a stack of symbol tables: when you enter a new scope, create a new symbol table for that scope and push it on the stack (and make it a child of the scope that was previously on the stack). When you leave a scope, pop the current scope off the table. When you see a new declaration, add it to whatever scope is currently on the top of the stack

What you need to do

You should define the necessary semantic actions and data structures to let you build the symbol table(s) for Micro input programs.

At the end of the parsing phase, you should print out the symbols you found.

For each symbol table in your program, use the following format:

Symbol table <scope_name>name <var_name> type <type_name>name <var_name> type <type_name> value <string_value>;...

The global scope should be named GLOBAL, function scopes should be given the same name as the function name, and block scopes should be called BLOCK X where X is a counter that increments every time you see a new block scope. Function parameters should be included as part of the function scope.

The order of declarations matters! We expect the entries in your symbol table to appear in the same order that they appear in the original program. Keep this in mind as you design the data structures to store your symbol tables.

See the sample outputs for more complete examples of what we are looking for.

CAVEATYou may be tempted to just print declarations as you see them, rather than building an actual tree of symbol tables. While that will suffice for step 3, it will not be sufficientfor later steps of the project. We strongly suggestthat you build the necessary data structures since you will have to, eventually, anyway.

Handling errors

Your compiler should output the string DECLARATION ERROR <var_name> if there are two declarations with the same name in the same scope .

Sample inputs and outputs

The inputs and outputs we will test your program can be found here .

A sample compiler (a .jar file that you can invoke with -jar ) is available here .

What you need to submit

All of the necessary code for your compiler that you wrote yourself. You do not need to include the ANTLR jar files if you are using ANTLR.
A Makefile with the following targets:
1. compiler: this target will build your compiler
2. clean: this target will remove any intermediate files that were created to build the compiler
3. team: this target will print the same team information that you printed in step 0.
A shell script (this mustbe written in bash, which is located at /bin/bashon the ecegrid machines) called runmethat runs your scanner. This script should take in two arguments: first, the input file to the scanner and second, the filename where you want to put the scanners output. You can assume that we will have run make compilerbefore running this script.

While you may create as many other directories as you would like to organize your code or any intermediate products of the compilation process, both your Makefile and your runme script should be in the root directory of your repository.

Do not submit any binaries . Your git repo should only contain source files; no products of compilation or test cases. If you have a folder named test in your repo, it will be deleted as part of running our test script (though the deletion wont get pushed) make sure no code necessary for building/running your compiler is in such a directory.

You should tag your step 3 submission as step3-submission

Project Step 4 Expressions

Due: October 18th

Your goal in this step is to generate code for expressions. To do this, we will build semantic actions that generate code in an intermediate representation (IR) for assignment statements and expressions, and then translate that intermediate representation to assembly code.

We recommend that you do this in three steps, as it will make it easier to debug your code, but you can also choose to do it in two steps (step 1 is optional):

Generate an abstract syntax tree(AST) for the code in your function.
Convert the AST into a sequence of IR Nodesthat implement your function using three address code.
Traverse your sequence of IR Nodes to generate assembly code.

We will discuss each of these steps next. You may also find the Lecture 4 notes helpful.

(Note: in this step, and step 5, we will only have one function in our program, main .)

Abstract Syntax Tree

An Abstract Syntax Tree is essentially, a cleaned up form of your parse tree that more straightforwardly captures the structure of expressions, control constructs, etc. in your program. For many compilers, the AST is the intermediate representation, though we will further convert the AST into another intermediate representation.

What is the difference between a parse tree and an AST ?

Parse trees capture all of the little details necessary to implement your grammar. This means that it often contains extraneous information beyond what is necessary to capture the details of a piece of code (e.g., there are nodes for tokens like ;, and nodes for all of the sub-constructs we used to correctly implement order of operations). ASTs, in contrast, contain exactly the information needed to capture the meaning of an expression, including being structured to preserve order of operations.

For example, consider the parse tree for a + b * c :

Parse tree

Complicated, huh? Heres an abstract syntax tree that captures the same thing:

AST

Much simpler! We arent preserving anything except the bare minimum needed to describe the expression (note that we included the type of each of the variables in the program we can get that information from our symbol table!).

Building an AST

Note that the information in the AST is associated with various nodes in the parse tree. We can use semantic actions, just as we did in Step 3, to pass information up the parse tree to build up the AST. Instead of passing information about a declaration, we can instead pass partially constructed abstract syntax tree nodes (you may want to define a class or structure called ASTNode that can be the return type for the relevant constructs). For example:

add_op: generate an AddExprAST node that has two children (that you leave uninitialized) and keeps track of the operator ( +or -).
expr_prefix: this will have three AST Nodes passed up from its sub rules: one from expr_prefix(which may be NULL, but otherwise will be an AddExprnode missing its right child), one passed up from factor(which will be a complete AST Node with all its fields filled in), and one passed up from add_op(which will be an AddExprnode that has neither its left nor its right child filled in).
1. If expr_prefixis NULL, make the add_opnodes left child the node from factorand return up the add_opnode (note that it wont have its right child filled in!)
2. If expr_prefix isntNULL, note that it will be missing its right child. Make the factornode its right child, then make the expr_prefixnode the add_opnodes left child, which you pass up.

This basic idea: creating AST nodes when you have the information for a new node, then filling in various fields of the node as you work your way up the parse tree, will let you eventually create an AST for all the statements in the function.

Hint: you should also create an AST node to capture lists of statements; each element of the list will point to an AST node for a single assign_stmt.

IR: 3 Address Code

The next step in our compilation process is to generate 3 Address Code (3AC), which is our intermediate representation. 3AC is an intermediate representation where each instruction has at most two source operands and one destination operand. Unlike assembly code, 3AC does not have any notion of registers. Instead, the key to 3AC is to generate temporaries variables that are used to hold the intermediate results of computations. For example, the 3AC for d := a + b * c (where all variables are integers) will be:

MULTI b c $T1ADDI a $T1 $T2STOREI $T2 d

Generating 3AC

Generating 3AC is straightforward from an AST. We can perform a post-order walk of the tree, passing up increasingly longer sequences of IR code called CodeObject s. Each code object retains three pieces of information:

A sequence of IR Nodes (a structure representing a single 3AC instruction) that holds the code for this part of the AST (i.e., that implements this part of the expression)
An indication of where the result of the IR code is being stored (think: the name of the temporary or variable where the result of the expression is stored)
An indication of the type of the result (INT or FLOAT)

Then, when we encounter something like an AddExpr Node, we can generate code for the overall expression as follows:

Create a new CodeObjectwhose code list is all the code from the left child of the AddExprfollowed by all the code for the right child.
Use the resultfields of the left and right CodeObjects to create a new 3AC instruction performing the add, storing the result in a new temporary. Add this new instruction to the end of your code list
Indicate in your CodeObjectthe temporary where the result is stored, and its type.
Return the new CodeObjectup the AST as part of your post-order walk.

Hint: the CodeObjectfor a simple variable wont have any 3AC code associated with it. Instead, mark the variable itself as the temporary the result is stored in.

Hint: You may find it useful to write a helper function to generate fresh temporaries.

Then, when you get to the top of the AST, you will have a single CodeObject that contains all of the IR code for the entire main function.

Note: We are generating code by performing a post-order walk of the AST. You can also generate code using this strategy by performing a post-order walk of the parse-tree (which is why you can optionally skip building the AST).

3AC instructions

Here are the 3AC instructions you should use:

ADDIOP1 OP2 RESULT (Integer add; RESULT = OP1 + OP2)SUBIOP1 OP2 RESULT (Integer sub; RESULT = OP1 - OP2)MULIOP1 OP2 RESULT (Integer mul; RESULT = OP1 * OP2)DIVIOP1 OP2 RESULT (Integer div; RESULT = OP1 / OP2)ADDFOP1 OP2 RESULT (Floating point add; RESULT = OP1 + OP2)SUBFOP1 OP2 RESULT (Floating point sub; RESULT = OP1 - OP2)MULFOP1 OP2 RESULT (Floating point mul; RESULT = OP1 * OP2)DIVFOP1 OP2 RESULT (Floating point div; RESULT = OP1 / OP2)STOREI OP1 RESULT (Integer store; store OP1 in RESULT)STOREF OP1 RESULT (Floating point store; store OP1 in RESULT)READI RESULT (Read integer from console; store in RESULT)READF RESULT (Read float from console; store in RESULT)WRITEI OP1 (Write integer OP1 to console)WRITEF OP1 (Write float OP1 to console)WRITES OP1 (Write string OP1 to console)

Generating Assembly

Once you have your IR, your final task is to generate assembly code. In this class, we will be using an assembly instruction set called Tiny. See the Tiny documentation for details about the instruction set.

This step is fairly straightforward: iterate over the list of 3AC you generated in the previous step and convert each individual instruction into the necessary Tiny code (note that Tiny instructions reuse one of the source operands as the destination, so you may need to generate multiple Tiny instructions for each 3AC instruction).

For now, we will be using a version of Tiny that supports 200 registers, so you can more or less directly translate each temporary you generate into a register.

Testing your Tiny code

You can test your Tiny code by running it on our Tiny simulator . This simulator can be built by running:

> g++ -o tiny tinyNew.C

You can then run a program with tiny commands using:

> ./tiny <code file>

What you need to do

In this step, you will be generating assembly code for assignment statements, expressions, and READ and WRITE commands. Use the steps outlined above to generate Tiny code. Your compiler should output a list of tiny code that we will then run through the Tiny simulator to make sure you generated the right result.

For debugging purposes, it may also be helpful to emit your list of IR code. You can precede a statement with a ; to turn it into a comment that our simulator will not interpret.

Handling errors

All the inputs we will give you in this step will be valid programs. We will also ensure that all expressions are type safe: a given expression will operate on either INTs or FLOATs, but not a mix, and all assignment statements will assign INT results to variables that are declared as INTs (and respectively for FLOATs).

Sample inputs and outputs

The inputs and outputs we will test your program can be found here .

A sample compiler (a .jar file that you can invoke with -jar ) is available here .

Grading

In this step, we will only grade your compiler on the correctness of the generated code. We will run your generated code through the Tiny simulator and check to make sure that you produce the same result as our code. When we say result, we mean the outputs of any WRITE statements in the program (not details such as how many cycles the code uses, how many registers, etc.)

We will not check to see if you generate exactly the same code that we do no need to diff anything. We only care if your generated code works correctly . You may generate slightly different code than we did.

Extra credit

For full credit on this assignment, your generated code merely needs to work properly. We will not consider how fast your code runs. However , we will also evaluate how fast your Tiny code runs (the Total Cycles reported by the Tiny simulator).

The groups whose generated Tiny code runs fastest (averaging across all the inputs) will receive bonus points for this step: 15% for the fastest code, 10% for second, and 5% for third.

What you need to submit

All of the necessary code for your compiler that you wrote yourself. You do not need to include the ANTLR jar files if you are using ANTLR.
A Makefile with the following targets:
1. compiler: this target will build your compiler
2. clean: this target will remove any intermediate files that were created to build the compiler
3. team: this target will print the same team information that you printed in step 0.
A shell script (this mustbe written in bash, which is located at /bin/bashon the ecegrid machines) called runmethat runs your scanner. This script should take in two arguments: first, the input file to the scanner and second, the filename where you want to put the scanners output. You can assume that we will have run make compilerbefore running this script.

You should tag your step 4 submission as step4-submission

Project Step 5 Control Structures

Due: November 1st

This step builds on Step 4. Now that we are able to generate code for lists of statements, what happens if those lists of statements are embedded in control structures? (IF statements and FOR loops)?

(Note: as in step 4, we will only have one function in our program, main .)

ASTs for Control Structures

ASTs for control structures are, intuitively, simple: each control structure will have several children (3 in the case of an IF statement, 4 in the case of a FOR loop) that are themselves ASTs (ASTs for statement lists in the case of the bodies of IF statements and FOR loops, ASTs for conditional expressions in the case of the conditions in the IF statements and FOR loops, etc.). Because you already have working code for building an AST for statement lists (and can readily adapt your code for binary expressions to build ASTs for conditional expressions), all you have to do is create semantic actions for the control structures that stitch together the existing ASTs.

Generating 3AC for Control Structures

Generating 3AC for control structures builds directly on your code for step 4, which is able to generate code for lists of statements. This means that when you are generating code for an IF AST node, you know that the 3AC for the three children already exists. All that is left is to put them together in the correct order (see the Lecture 5 notes for details on this) and insert any necessary labels and jumps.

There are two things that you need to pay attention to when putting together 3AC:

Generating Labels

At various points in your code, you will need to insert labels and jumps to allow control to transfer from one part of your code to another. You will need to make sure that you can generate unique labels every time (since your code will not work properly if there are multiple labels with the same name).

The 3AC you will generate for labels looks like:

LABEL STRING

Where STRING is whatever name you decide to give to your label

Generating Jumps

Unconditional jumps (like you might use to jump over an ELSE block) are easy:

JUMP STRING

Where STRING is the label you want to jump to.

Conditional jumps are a little bit tricky in our 3AC (and in Tiny): you need to generate the right kind of jump:

GTOP1 OP2 LABEL (Jump to Label if OP1 > OP2)GEOP1 OP2 LABEL (Jump to Label if OP1 >= OP2)LTOP1 OP2 LABEL (Jump to Label if OP1 < OP2)LEOP1 OP2 LABEL (Jump to Label if OP1 <= OP2)NEOP1 OP2 LABEL (Jump to Label if OP1 != OP2)EQOP1 OP2 LABEL (Jump to Label if OP1 == OP2)

To generate the right kind of jump for a conditional expression, you will need to inspect the AST node for the comparison operation, and use that information to select the right 3AC instruction.

Note: the 3AC does not preserve type information about what kind of comparison you are doing, but the Tiny code for jumps does; you may want to extend either your 3AC or your data structure to keep track of this information.

Testing your Tiny code

You can test your Tiny code by running it on our Tiny simulator . This simulator can be built by running:

> g++ -o tiny tinyNew.C

You can then run a program with tiny commands using:

> ./tiny <code file>

What you need to do

In this step, you will be generating assembly code for IF statements and FOR loops. Use the steps outlined above to generate Tiny code. Your compiler should output a list of tiny code that we will then run through the Tiny simulator to make sure you generated the right result.

For debugging purposes, it may also be helpful to emit your list of IR code. You can precede a statement with a ; to turn it into a comment that our simulator will not interpret.

Handling errors

Sample inputs and outputs

The inputs and outputs we will test your program can be found here . Note that in the inputs subdirectory, files with extension .micro are the test programs, and files with extension .input are sample inputs that work with the test programs (i.e., provide inputs for READ commands).

A sample compiler (a .jar file that you can invoke with -jar ) is available here .

Grading

Extra credit

The groups whose generated Tiny code runs fastest (averaging across all the inputs) will receive bonus points for this step: 15% for the fastest code, 10% for second, and 5% for third.

What you need to submit

All of the necessary code for your compiler that you wrote yourself. You do not need to include the ANTLR jar files if you are using ANTLR.
A Makefile with the following targets:
1. compiler: this target will build your compiler
2. clean: this target will remove any intermediate files that were created to build the compiler
3. team: this target will print the same team information that you printed in step 0.
A shell script (this mustbe written in bash, which is located at /bin/bashon the ecegrid machines) called runmethat runs your scanner. This script should take in two arguments: first, the input file to the scanner and second, the filename where you want to put the scanners output. You can assume that we will have run make compilerbefore running this script.

You should tag your step 5 submission as step5-submission

Whatsapp Us

[SOLVED] ECE 468 Symbol Table Expressions Control Structures

Due: September 29th

Semantic Actions

Parser-driven Semantic Actions

ANTLR

Bison

Symbol Tables

Nested Symbol Tables

What you need to do

Handling errors

Sample inputs and outputs

What you need to submit

Due: October 18th

Abstract Syntax Tree

What is the difference between a parse tree and an AST ?

Building an AST

IR: 3 Address Code

Generating 3AC

3AC instructions

Generating Assembly

Testing your Tiny code

What you need to do

Handling errors

Sample inputs and outputs

Grading

Extra credit

What you need to submit

Due: November 1st

ASTs for Control Structures

Generating 3AC for Control Structures

Generating Labels

Generating Jumps

Testing your Tiny code

What you need to do

Handling errors

Sample inputs and outputs

Grading

Extra credit

What you need to submit

Reviews

Related products

[Solved] Problem 3: Who are the Winners

[Solved] Modify your first program to print a table of the words

[SOLVED] Naughty Receiver – Reliable Data Transfer

[SOLVED] SciCalculator

[Solved] Program that has three functions: sepia(), remove_all_red(), and gray_scale()

[Solved] Payroll calculation program-Python