Code Generation for Decaf Expressions
Start on Jun 25, 2019 Due on Jul 9, 2019
Your task for this homework is to use LLVM in order to write the code generation step for expressions and methods in the Decaf programming language (decafspec.html).
LLVM
We will be using a code generation and compiler toolkit called LLVM (http://llvm.org/) for this homework. We will be using LLVM version 8.0.0.
Before you start programming for this homework it is very important you work through the LLVM practice problems (llvm-practice.html) first.
We will be using various LLVM tools such as llvm-as and llc . The easiest way to access the right version of these binaries is to use llvm-config bindir and add this directory as a prefix to the LLVM binaries.
We can also use llvm-config to generate the necessary command line options for g++ or clang++
The text generated by the above command can be used to tell g++ or clang++ which LLVM libraries to link with your program and where they can be found. Getting Started
You must have git and python (3.x) on your system to run the assignments. Once youve confirmed this, run this command:
In the decafexpr directory you will find various python programs which you will use to test your solution to this homework.
If you have already cloned the repository earlier you can get the new homework files by going to the directory where you cloned the repository earlier and then
doing:
To get started with your homework do the following steps.
Copy over files to your repository
Assuming you have set up your repository using the instruction in HW0 (hw0.html), clone your repository and enter that directory and copy over the decafexpr files:
If you update my repository using git pull then you might have to copy over the new files into your repository. Be careful you do not clobber your own files in the answer directory.
Default solution
Your solution must be compiled in the answer directory and must be called decafexpr . There is an incomplete solution to this homework in the answer directory. You can create the default binary as follows:
The Challenge
The goal of this homework is to write a code generator for variables, simple expressions and methods for the Decaf programming language. The output will be in LLVM assembly which is compiled to x86 assembly and then to a binary file. The first step will be to write a symbol table implementation for your compiler. The structure of Decaf and code generation hints are given in the Decaf specification:
llvm-config cppflags ldflags libs core mcjit native
git clone https://github.com/anoopsarkar/compilers-class-hw.git
# go to the directory where you did a git clone earlier
git pull origin master
git clone [email protected]:YOUR_USERNAME/CMPT379-1194-YOUR_USERNAME.git
cd CMPT379-1194-YOUR_USERNAME
mkdir -p decafexpr
cd decafexpr
cp -r /your-path-to/compilers-class-hw/decafexpr/* .
git add *
git commit -m initial commit
git push
cd your-repo-name/answer
make default
Decaf specification (decafspec.html)
Read the specification carefully before you attempt to write any code to solve this homework.
Your Task
Using the Decaf language specification (decafspec.html) as your guide, you have two steps:
Step 1: Symbol Table
Implement a symbol table that can keep track of variables and methods in Decaf.
A symbol table is a mapping from identifiers to any information that the compiler needs for code generation. A symbol table is easily implemented using hash tables or maps, e.g. here is a declaration of a symbol table using STL in C++:
where a descriptor is a structure or class which contains useful information about the variable including a type, a register destination, a memory address location, and a variable spilled indicating if the value is to be found in a register or in memory (note that we will not use memory locations for variables until later).
In Decaf we are allowed to shadow a variable declaration (https://en.wikipedia.org/wiki/Variable_shadowing). This means that a new definition for an identifier in a block will cause a new descriptor to be associated with the identifier, but once the block terminates the previous descriptor for the identifier has to be restored. A simple way to implement this notion of local scoping is to specify that each block can create a new symbol table in a list:
If a variable has a local definition that shadows another definition of the same variable name, we pick up the most recently defined descriptor for that variable by simply scanning the list of symbol tables starting from the most recent one:
The full example of how to use this data structure is available as a github gist (https://gist.github.com/anoopsarkar/0a570c6bca560412048cb81401349d38). This is just one way to implement a symbol table. You can implement it any way you like, as long as it can handle shadowing of variables.
For this first step, you can assume that the identifiers are only variables, but for solving this homework (including Part 2) the symbol table will also store information about function names, global variables, etc., and additional information will have to be added to the descriptor.
To test your symbol table implementation, use it to remember the line number of each variable definition and then introduce a comment line into the input Decaf program that specifies the line number of the variable definition for that identifier.
When entering a new block you should make sure to insert a new symbol table so that local re-definitions of a variable will shadow higher level definitions. In order to do this in yacc the easiest way is to add a new non-terminal that can be used to trigger the action you want to do: either insertion of a new symbol table or removal of a symbol table for an out of scope block. For example, lets take a single CFG rule in a yacc program:
If we wish to create a new scope region when we scan the { it is easier if the grammar is changed to the equivalent form below:
But now begin_block and end_block can support actions such as creation of a new scope or removal of an old scope.1 Here is an example of what Step 1 should do. For input Decaf program:
Your program should print out the following on standard error:
typedef map
typedef list
symbol_table_list symtbl;
descriptor* access_symtbl(string ident) {
for (auto i : symtbl) {
auto find_ident = i.find(ident);
if (find_ident != i.end()) {
return find_ident->second;
}
}
return NULL;
}
block: { var_decl_list statement_list }
block: begin_block var_decl_list statement_list end_block
begin_block: {
end_block: }
extern func print_int(int) void;
package C {
func main() int {
var x int;
x = 1;
print_int(x);
}
}
defined variable: x, with type: int, on line number: 4
And on standard output:
There are some additional testcases available in the decafsym directory in the compilers-class-hw git repository.
Step 2: Code Generation for Expressions
Provide code generator for the following fragment of Decaf that includes:
Arithmetic and Boolean expressions.
Warning: do not attempt short-circuit of boolean expressions (https://en.wikipedia.org/wiki/Short-circuit_evaluation) for this homework. Function calls.
Function definitions (including recursive functions).
Declaration of extern functions (all extern functions are defined in decaf-stdlib.c ).
The following AST nodes should have a fully working code generation implemented (except for short-circuiting).
extern func print_int(int) void;
package C {
func main() int {
var x int;
x = 1; // using decl on line: 4
print_int(x) // using decl on line: 4;
}
}
1 Implement code generation for the following AST nodes.
2 It is a subset of the nodes in the full Decaf.asdl AST definition.
3
4 module Decaf
5{
6 prog = Program(extern* extern_list, package body)
7
8 extern = ExternFunction(identifier name, method_type return_type, extern_type* typelist) 9
10 package = Package(identifier name, field_decl* field_list, method_decl* method_list) 11
12 extern_type = VarDef(StringType) | VarDef(decaf_type)
13
14 method_decl = Method(identifier name, method_type return_type, typed_symbol* param_list, method_block block) 15
16 method_block = MethodBlock(typed_symbol* var_decl_list, statement* statement_list)
17
18 block = Block(typed_symbol* var_decl_list, statement* statement_list) 19
20 statement = assign
21 | method_call
22 | ReturnStmt(expr? return_value)
23 | block
24
25 assign = AssignVar(identifier name, expr value) 26
27 expr = rvalue
28 | method_call
29 | NumberExpr(int value)
30 | BoolExpr(bool value)
31 | BinaryExpr(binary_operator op, expr left_value, expr right_value)
32 | UnaryExpr(unary_operator op, expr value)
33
34 rvalue = VariableExpr(identifier name)
35
36 decaf_type = IntType | BoolType
37
38 method_type = VoidType | decaf_type
39
40 typed_symbol = VarDef(identifier name, decaf_type type)
41
42 method_call = MethodCall(identifier name, method_arg* method_arg_list) 43
44 method_arg = StringConstant(string value)
45 | expr
46
47 bool = True | False 48
Most of these are trivial to implement using the LLVM API. Only the first one, arithmetic and boolean expressions is non-trivial. The Decaf language specification has a section on Semantics which lays out the rules of type coercion and other gray areas in the implementation of Decaf.
Boolean constants are of type i1 in LLVM assembly. Char constants can be either i8 or i32 . You will need to refer to the LLVM Documentation (http://releases.llvm.org/8.0.0/docs/index.html) and the LLVM Tutorial (http://llvm.org/releases/8.0.0/docs/tutorial/index.html).
As shown in LLVM practice (llvm-practice.html) you should extend your abstract syntax tree (AST) classes to add LLVM API calls for code generation. Keep all the AST classes from your parser in HW2. For the AST nodes not mentioned above just return nullptr or NULL as the value for the code generation function call. For the rest you should make sure the code generation function returns the appropriate LLVM code block of type LLVM::Value * .
Requirements
More details about the task is provided by examining the testcases for this homework.
The output should be in LLVM assembly which can be compiled to x86 assembly using the LLVM tools and run as a binary. We will use the binary llvm-run in the answer directory to create and run the binary from the Decaf programs in the testcases directory.
The LLVM assembly and toolchain output is dumped into the llvm directory. You should examine your output to debug your compiler. Make sure you obey the following requirements:
1. If your program succeeds in parsing the input you should exit from your program using exit(EXIT_SUCCESS) . And if your program finds an error in the input Decaf program you should exit using exit(EXIT_FAILURE) . The definitions of EXIT_SUCCESS and EXIT_FAILURE are in cstdlib (for C++) and in stdlib.h (for C).
2. You must dump the LLVM assembly by calling TheModule->dump() where TheModule is of type llvm::Module* . Development and upload procedure
Remember to push your solution source code to your repository:
Then each time you finish a component of your solution you can push it to the remote repository:
You have been given three helper programs to help you develop your solution to this homework.
Run your solution on testcases
Run your solution program on the testcases using the Python program zipout.py . Your solution must be compiled in the answer directory and must be called decafexpr . Run against all testcases as follows:
This creates a directory called output and a file output.zip which can be checked against the reference output files (see section on Check your solution below).
If you run zipout.py multiple times it will overwrite your output directory and zip file which should be fine most of the time (but be careful). Check your solution
Check your solution accuracy using the Python program check.py . You must create an output.zip file using the above step in Run your solution on testcases. Note that the references are only available for the dev testcases. When you are graded you will be evaluated on both the dev and test testcases. output.zip contains your output for both sets of testcases.
You can use the default program provided to get an initial solution to this homework. Run python3 zipout.py -r default to get a source.zip file you can score using check.py .
cd answer
git add decafexpr.y decafexpr.lex # and any other files you need for the solution
git commit -m initial solution
git push
git add [source-file]
git commit -m commit message [source-file]
git push
# go to the directory with the file zipout.py
python3 zipout.py
python3 check.py
Correct(dev): 2 / 100
Score(dev): 2.00
Total Score: 2.00
Package your source for Coursys
49 binary_operator = Plus | Minus | Mult | Div | Leftshift | Rightshift | Mod | Lt | Gt | Leq | Geq | Eq | Neq | And | Or 50
51 unary_operator = UnaryMinus | Not
52 }
)moc.buhtig//:sptth(buHtiGybhtiwdetsoh)ldsa-rpxefaced-elif#b48565a194a1ebd773feb9f00653e54c/rakraspoona/moc.buhtig.tsig//:sptth(ldsa.rpxefaced )ldsa.rpxefaced/1c85edb6ee6bd26666c1dd10991ea5f215203f82/war/b48565a194a1ebd773feb9f00653e54c/rakraspoona/moc.buhtig.tsig//:sptth( war weiv
You must also upload your source code to Coursys. You should prepare your source for upload using the Python program zipsrc.py .
# go to the directory with the file zipsrc.py
python3 zipsrc.py
This will create a zip file called source.zip . You should upload this file as your submission to hw1 on Coursys (https://coursys.sfu.ca/2019su-cmpt-379-d1/). Be careful: zipsrc.py will only package files in the answer directory. Make sure you have put all your supporting files in that directory. In particular, put
relevant documentation into answer/README.md .
If you add any testcases of your own please put them in the directories answer/testcases/[your-username]/ and
answer/references/[your-username]/ using the same convention used by zipout.py and check.py . Ground Rules
You must turn in two things:
Your source code from the answer directory as a zip file source.zip produced by running python3 zipsrc.py must be uploaded to the
hw3 submission page on Coursys (https://coursys.sfu.ca/2019su-cmpt-379-d1/).
Your output on the testcases which is the file output.zip produced by running python3 zipout.py must be uploaded to the hw3 submission page on Coursys (https://coursys.sfu.ca/2019su-cmpt-379-d1/). When we run check.py on the public testcases it should have a value higher than the output from the default program to get any marks.
Your source code from source.zip must be on your gitlab repository. Please commit and push often in order to get feedback on your code. Make sure that we can run make decafexpr in your answer directory to create the decafexpr binary.
You cannot use data or code resources outside of what is provided to you. If you use external code snippets provide citations in the
answer/README.md file.
For future homeworks, for the written description of your solution and supporting documentation, you can use plain ASCII but for math equations it is better to use kramdown. Do not use any proprietary or binary file formats such as Microsoft Word.
Grading
Score for testcases both dev and test.
Code review by TAs. Please check for comments on your code on gitlab (http://gitlab.cs.sfu.ca).
If you have any questions or youre confused about anything, just ask.
1. A more advanced version of the solution to this task is to do the scoping over the entire AST that is created by the parser. Implement the scoping by adding and removing new scopes in your symbol table by doing a top-down pass over the AST. This way of solving it is closer to the top-down code generation in Step 2 of this homework.
Last updated June 17, 2019.
Forked from the JHU MT class code on github (https://github.com/mt-class/jhu) by Matt Post (https://github.com/mjpost) and Adam Lopez (https://github.com/alopez).
Reviews
There are no reviews yet.