[SOLVED] C data structure Scheme html MIPS python compiler operating system graph SUSTech CS323 Compilers Project 3 Nov 19, 2019 IntermediateCode Generation

$25

File Name: C_data_structure_Scheme_html_MIPS_python_compiler_operating_system_graph_SUSTech_CS323__Compilers_Project_3_Nov_19,_2019_IntermediateCode_Generation.zip
File Size: 1394.16 KB

5/5 - (1 vote)

SUSTech CS323Compilers Project 3 Nov 19, 2019 IntermediateCode Generation
1 Overview
So far, weve completed the analysis part of the SPL compiler, and we are ready to move to the synthesis or generation part in this project. At the current stage, your compiler can recognize a valid SPL program. After project 3, your compiler will be able to generate a particular intermediate representation IR of a given source program, which can be further optimized for better runtime performance.
Most compilers translate a source program first to some form of IR and convert from there into machine code. The IR is a machine and language independent version of the original source code. Although converting the code twice introduces one additional step, the use of an IR provides increased abstraction, cleaner separation between the front and back ends. IRs also facilitate advanced compiler optimization. In fact, most optimization is done on the intermediate code, rather than the source code or target code.
Threeaddress code TAC will be the intermediate representation used in our SPL com piler. A TAC instruction can have at most three operands addresses. For example, there could be two operands to a binary arithmetic operator, where the third address stores the computational result. There are also cases where an instruction contains only two operands. To know the TAC instruction set used in our project, please read Appendix A carefully. We also provide a simulator to execute the generated TAC, which is already installed in our lab virtual machine.
You will continue with code generation on top of the semantic analyzer. For your imple mentation, you can insert generation routines inside the analysis procedures, so that your compiler can generate intermediate representation during semantic analysis. You can also generate code after all analyses are completed. Again, running in onepass is more efficient, though it is preferable to separate these stages generating code after the analysis stage for better modularity and flexibility. Please also be reminded to keep your code maintainable and extensible since the subsequent part of the SPL compiler relies on your work here.
2 Lab Environment
We will evaluate your program on an Ubuntu 18.04 64bit environment, since you will implement the code generator based on the last project, Flex and Bison are still necessary. Again, we list the dependencies below:
GCC version 7.4.0
GNU Make version 4.1GNU Flex version 2.6.4
1

SUSTech CS323Compilers Project 3 Nov 19, 2019
GNU Bison version 3.0.4
Python version 3.6.8
Spim version 8.0
urwid Python module 2.0.1
We prepared an IR simulator so that you can run your generated IR code. It is already installed in the lab virtual machine. It is built with a Python module called urwid. If you want to run it on your own computer, make sure you have installed the module. Note that urwid does not support Windows environment.
For example, if you want to run an IR program test a.ir with three integer inputs 1, 9, 42, type the following command:
irsim test a.ir i 1,9,42
The option i stands for input numbers, which are separated by commas ,. For a program without user input, this option can be simply ignored. You can also specify the log file path by l option. By default, logs will be saved to the file irsim.log at the current directory.
The simulator runs an IR program from the main function. There are three buttons in the panel, step will execute a single instruction, exec runs the program until the main function exits, and stop will terminate the current execution. The code section is under the three button. The instruction with a leadingsign is highlighted as the current program counter position. The right side of the simulator interface contains two panels. The one at the top records the runtime values of variables, while the one at the bottom displays program outputs. To exit the simulator, press the Esc key or send a kill signal C by CtrlC.
When the IR program terminates, the simulator will display the number of instructions it executed. We will use this number as the evaluation metric of the efficiency of your generated code. You should try your best to reduce the number of executed instructions.
2

SUSTech CS323Compilers Project 3 Nov 19, 2019 3 Intermediate Representation
In a broader sense, IR can be any form of the original program during compilation phases, such as the token stream generated by the lexical analysis, the parse tree generated by the syntax analysis, the threeaddresscode produced in code generation, etc. In this section, we will focus only on the IR produced during intermediatecode generation. This kind of IR serves as the bridge between a compilers frontend and backend. It is language and machineindependent, and facilitates crosscompilation and code optimization.
In term of representation format, there are linear IR, tree IR, graph IR, etc. In Project 1, we have converted the source code into a parse tree, which is conceptually a form of tree IR. The output of this project, the TAC program, is a kind of linear IR, however, you can use any other representations during your code generation. For example, you can organize the TAC instructions into a graph of basic blocks, in which a node represents a sequence of instructions without jumping, and an edge represents an unconditionalconditional jump between two basic blocks. This kind of IR is called a controlflow graph, which is helpful for many modern compiler optimizations, such as constant propagation and deadcode elimination.
Typically, you will not immediately print out the TAC segment once it is translated from some syntax nodes, since you may reorganize the code structure or do optimization on the intermediate code. So you need to store them in the memory before printing them into the generated code file. For this purpose, it is important to consider how to keep them with a proper data structure. Here we introduce two kinds of structures: linear IR and tree IR.
3.1 Linear IR
Threeaddress codes are often implemented as a set of quadruples. Each quadruple is an objectrecord that consists of four fields: an operator, two operands or sources, and a destination or result. To form blocks, the compiler needs a mechanism to connect individual quadruples. See Figure 1 for three different structures for implementing the threeaddress code for the expression a2b.
a Static array style b Pointer array style c Doublylinked list style
Figure 1: Three kinds of linear IR data structures
Figure 1a is the most straightforward implementation, in which a large array is allocated in advanced. Each element in the array stores a single TAC instruction. However, it suffers
Target Op Arg1 Arg2
t12
t2b
t3t1 t2
t4a
t5t4 t3
t1

2
t2
t3

b
t1
t2
t4

a
t4

a

t5
t4
t3
t1
t2

b
t3
t5

2
t1
t4
t2
t3
3

SUSTech CS323Compilers Project 3 Nov 19, 2019
from low efficiency. Every time a new instruction is inserted at the middle, you should pull all following elements one position behind. Whats more, the code size is limited by the arrays length. The pointer array implementation 1b inherits the same limitation of the array implementation, excepts that moving elements is cheaper since only the addresses are swapped, rather than the list of quadruples. Doublylinked list in Figure 1c is more recommended, though it is more complex than the previous two styles. Linked list data structure is efficient in both insertion, deletion and element replacement. Moreover, it does not require preallocated space, hence the code size can be arbitrarily large as long as the memory is sufficient.
Printing out linear IR is trivial: for the array style, just iterate from the start position until the end, and print out each element in the sequential order; as for linked list style, a whileloop may be followed from the head of list until the next field points to NULL.
3.2 Tree IR
Tree IR is naturally hierarchical, hence preserving the original program structure. For ex ample, syntax tree is a kind of tree IR. Organizing the threeaddress code IR into a tree structure is simpler than constructing a syntax tree from the source program, since the TAC specification is simpler than SPL syntax.
A syntax tree contains all the information about the parsed program. It is different from a parse tree. In a parse tree, interior nodes represent nonterminals in the grammar and the tree depicts the process of generating the source program from the grammar. In comparison, a syntax tree does not contain such grammatical details. Interior nodes in a syntax tree represent operators and their children nodes represent the data or argumentsoperands used in the operations. For this difference, parse trees and syntax trees are also called concrete syntax trees and abstract syntax trees AST, respectively.
Printing out tree IR requires a depthfirst search from the trees root node, as you have realized in Project 1.
4 Runtime Environment
Programming languages provide abstractions to hardware details, such as names, scopes, data types, operators, functions, and controlflow constructs. However, modern computer hardware can only understand lowlevel primitives, e.g., arithmetic operations on 3264 bit integers, data movement and control structure by boolean expressions. Compilers must work with operating systems to support highlevel abstractions on the target architectures.
To do so, a compiler cannot merely translate the code. It should also generate extra code to maintain highlevel features by means of lowlevel operations. During code generation, the compiler should manage targetmachine mechanisms so that the generated code can run on the same runtime environment.
The runtime environment contains many details related to the target machine. In this
4

SUSTech CS323Compilers Project 3 Nov 19, 2019
project, we only discuss about some general concepts and how they can be represented by our TAC specification. In the next project, we will introduce more about the target runtime, i.e., MIPS architecture.
4.1 Data Representation
Many highlevel languages provide various simple data types that are typically stored in a sufficiently large space.
characters: 1 or 2 wide character bytes
floats: 4 to 16 bytes
integers: 2, 4 or 8 bytes
booleans: 1 bit typically use at least 1 full byte
These types are directly supported by most hardware. In addition, the pointer types in Cfamily languages are stored as unsigned integers. Their ranges depend on the target architecture, i.e., 4 bytes for 32bit machines that support maximum 4GB memory addressing, and 8 bytes for 64bit machines. There is actually very little distinction between a pointer and an unsigned integer at lowlevel: they are all numbers, and the only difference is how the compiler interprets these numbers.
An array stores a group of elements in continuous memory space. These elements are stored consecutively from lowaddress locations to highaddress ones in the memory. Figure 2 shows a Cstyle array representation. The red labels on the blocks represent the memory address of each element, while b is the type length of a single element. Recall that the name arr is a constant pointer, which points to the start address of the array. In general, for a singledimensional array arr with n elements, the ith we count from 0 element is stored at the memory location arrib. The runtime of SPL programs also adopts this kind of array representation.
Figure 2: A Cstyle array in memory
There are two strategies to encode multidimensional arrays: true multidimensional array Figure 3a and array of arrays Figure 3b.
A true multidimensional array is one contiguous block. In a rowmajor configuration, the rows are stored one after another. For an mn 32bit integer array, we compute the address of an element arrij as arr4nij. Alternatively, in the array of arrays representation, accessing an element involves two pointer dereferences, while the previous representation requires only once. Moreover, array of arrays is more spaceconsuming: for
arr arrb arr2b arrn1b
arr0
arr1
arr2

arrn1
5

SUSTech CS323Compilers Project 3 Nov 19, 2019
a True multidimensional array representation strategy
b Array of arrays representation strategy
Figure 3: Two strategies for representing multidimensional array
an n1n2 nk array, the data fields is exactly the same as the true representation, while there will be extra n1 n2 nk1 pointer fields. The SPL runtime adopts the first strategy. Nonetheless, the second one still is considered advantageous for its flexibility.
A Structure variable is stored by allocating the fields sequentially in a contiguous block, then the member access is represented by the base address plus a member offset. The structure type structint i; char c; double d; in C language requires 13 bytes with the individual fields at offsets 0, 4 and 5 bytes from the base address. However, it will actually allocate 16 bytes for this struct, because it will insert padding between the fields so that each field starts on an aligned boundary, as shown in Figure 4.
Figure 4: An aligned structure variable
The SPL runtime is limited to integer type only, hence it is not essential to consider the data structure alignment. However, if you want to support more diverse data types, such as char or short, you should note that MIPS architecture requires all variables in the runtime memory aligned to the address length.
4.2 Function Invocation
Each active function has its own activation record, which stores the information related to the function invocation. These information includes arguments, local variables, registers, return address, etc. In most architectures, the activation record is stored in the stack, hence a functions activation record is also called its stack frame.
In this project, you will not deal with the details about how to maintain the stack frame, since it is machinedependent. You shall pass arguments explicitly with ARG statement, then invoke a function by its identifier with CALL statement.
a00
a01
a02
a10
a11
a12
a00
a01
a02
a0
a1
a10
a11
a12
char c
int i
aligned
double d
6

SUSTech CS323Compilers Project 3 Nov 19, 2019
There are two main approaches for passing arguments: pass by value and pass by reference. In the SPL runtime, primitive type arguments are passed by its true value. When a function is active, these values have their own copies on the stack frame, and changing the copies does not affect their original values, which are in the callers frame. As for a derived type argument, the callee function accepts its reference, i.e., its starting address. In C language, we will explicitly pass a struct pointer to avoid copying the whole structure, while in SPL, struct arguments are implicitly converted to their starting addresses. This runtime behavior should be maintained by your compiler: to pass a struct variable s1, you should push it by ARG s1 rather than ARG s1; to access a member at a particular offset, you should add the base address by the offset and then access the new address with the dereference operator .
One last note is, we push the arguments in reverse order see Appendix, what is the reason? The answer is related to the memory layout of the calling sequence. Take Linuxx86 runtime as an example:
low addr.
high addr.
Local variables
stack grows
Others
Return address
Argument 1
Argument 2

Previous frame
Figure 5: A runtime stack of a Linuxx86 process
When a function is invoked, a new frame will be pushed into the stack. Here, the push action contains multiple steps: firstly, arguments are pushed from high address to lower, then the return address is recorded on top of the arguments, and finally, the memory units for local variables are prepared. The memory section from the local variables to the arguments make up the new stack frame for the callee.
Since arguments are pushed one by one, the first pushed argument is at the bottom of the stack high address, and the last one is at the top low address. Thats why we should push them in reverse order. In this way, fetching arguments in the function body is straightforward: arguments will be represented by the offset from the return address register. In the example, argument 1 can be represented by ret4, and argument i at ret4i1. Here 4 bytes is the address length on a 32bit machine.
1Actually, there is no ret register, but the ebp register stores ret by constant offset
7

SUSTech CS323Compilers Project 3 Nov 19, 2019 5 Translation Schemes
To generate TAC from syntax tree, you need to traverse the tree in postorder, then con vert the tree nodes according to some particular patterns. You will resort to syntaxdirected translation to accomplish this task. At the implementation level, you have to implement a set of translate X functions for each nonterminal X, and invoke these functions recursively, either directly or indirectly. We will introduce some examples that translate particular pro ductions, though there may be other ways to translate the syntax tree, such as constructing the corresponding control flow graph and linking basic blocks by jump instructions.
5.1 Expressions
We categorize expressions into basic and conditional expressions. They correspond to a portion of the 26 rewriting rules of nonterminal Exp. We will not list all translation schemes. We leave some of them for you.
5.1.1 Basic expressions
Table 1 shows translation schemes for basic expressions. We assume that the symbol table, which you have built in Project 2, is globally accessible. Function symtab lookup accepts a string of variable identifier, and returns the corresponding symbol information. The pa rameter place is the address variable that stores the evaluation result of this expression. translation Exp returns TAC code of the node Exp and its children nodes.
In the right column, the code enclosed by a pair of brackets represents a concrete TAC instruction, and using plus signon two TAC instructions means concatenating them.
The translation schemes for the first two rewriting rules are simple. For the production of assignment expressions, Weve already constrained the lvalue to be an identifier, array expression or structure expression through semantic analysis. Here we only show the case when the left side is an identifier, i.e., Exp1ID, and we will talk more about the last two cases later in 5.4. Recall that assignment operator is righttoleft associated, so we should firstly evaluate Exp2, then assign the resulting value to the corresponding variable.
For arithmetic operations, we should evaluate two operands, then apply corresponding operator on them. Here we define a lefttoright evaluation ordering, hence we evaluate Exp1 before Exp2. The unary minus of an expression results in a negative case, which can be simply represented by subtracting itself from zero.
There are 9 3 booleans and 6 comparisons conditional expressions defined in SPL gram mar Their values are either 0 or 1. We translate these expressions by an auxiliary function translate cond Exp, which will be illustrated in 5.1.2.
5.1.2 Conditional expressions
The conditional expressions are typically used in if and while statements, which involve conditional jump instructions. These jump instructions will be generated together with
8

SUSTech CS323Compilers Project 3 Nov 19, 2019 Table 1: Basic expression translation schemes
INT ID
translate ExpExp, placecase Exp of valueto intINT
return place : value
variablesymtab lookupID
Exp1 ASSIGN Exp2
Exp1 PLUS Exp2
MINUS Exp
cond. Exp
return place : variable.name variablesymtab lookupExp1.ID tpnew place
code1translate ExpExp2, tp code2variable.name : tp code3place : variable.name return code1code2code3
t1new place
t2new place
code1translate ExpExp1, t1
code2translate ExpExp2, t2
code3place : t1t2
return code1code2code3
tpnew place
code1translate ExpExp, tp
code2place : 0tp
return code1code2
lb1new label
lb2new label
code0place : 0
code1translate cond ExpExp, lb1, lb2
code2LABEL lb1place : 1LABEL lb2 return code0code1code2
the conditional expressions IR code. We define an auxiliary function translate cond Exp, which takes two additional parameters, lb t specifies the location label to jump when the condition evaluates to TRUE 1 in its essence, and lb f for the inverse case, to translate a conditional expression. You should ensure the first argument of translate cond Exp is always a conditional expression. This can be checked before translation.
The schemes in Table 2 produce shortcircuit evaluations for operation ANDOR. To see the effect, consider rule Exp1 AND Exp2: when Exp1 evaluates to FALSE i.e., zero by the IR code of translate cond ExpExp1, lb1, lb f, the control flow jumps directly to lb f and code2 is skipped. For the case it is TRUE, code2 will be executed to evaluate the whole ANDexpression. The same mechanism holds for ORexpressions.
We list translation schemes for 4 conditional operations for the other cases, simply replace
9

SUSTech CS323Compilers Project 3 Nov 19, 2019 Table 2: Conditional expression translation schemes
Exp1 EQ Exp2
translate cond ExpExp, lb t, lb fcase Exp of t1new place
t2new place
code1translate ExpExp1, t1
Exp1 AND Exp2
Exp1 OR Exp2 NOT Exp
code2translate ExpExp2, t2
code3IF t1t2 GOTO lbtGOTO lbf
return code1code2code3
lb1new label
code1translate cond ExpExp1, lb1, lb fLABEL lb1 code2translate cond ExpExp2, lb t, lb f
return code1code2
lb1new label
code1translate cond ExpExp1, lb t, lb1LABEL lb1 code2translate cond ExpExp2, lb t, lb f
return code1code2
return translate cond ExpExp, lb f, lb t
EQ token. In fact, since C is weakly typed, nonconditional expressions can also be used in control flow statements. A widely used example is whileT block, where the loop body will execute T times for any nonnegative T, and the loop terminates when T reaches zero. You can think about how to translate this kind of control flow, and realize it as your compilers bonus feature.
5.2 Statements
Table 3 shows translation schemes for statements. To translate controlflow statements i.e., if and while, we utilize the previously defined function translate cond Exp. Translat ing conditional expressions introduces conditional jumps, while there are only unconditional jumps in statements translation schemes.
5.3 Function Invocations
There is one thing to do before translating the functions: you should add a read function to the symbol table, which takes no parameter and return an integer value, and a write function that accepts a single integer parameter. These predefined functions provide userinteraction for SPL programs. They have special translation schemes: read function invocation should be translated into the READ instruction, and write to WRITE, similarly.
There are two rewriting rules related to function invocation, the one with arguments and the other without. The builtin function read and write have their own translation schemes.
10

SUSTech CS323Compilers Project 3 Nov 19, 2019 Table 3: Statement translation schemes
RETURN Exp SEMI
IF LP Exp RP Stmt
IF LP Exp RP Stmt1 ELSE Stmt2
WHILE LP Exp RP
Stmt
translate StmtStmtcase Stmt of tpnew place
codetranslate ExpExp, tp return codeRETURN tp
lb1new label
lb2new label
code1translate cond ExpExp, lb1, lb2LABEL lb1 code2translate StmtStmtLABEL lb2
return code1code2
lb1new label
lb2new label
lb3new label
code1translate cond ExpExp, lb1, lb2LABEL lb1 code2translate StmtStmt1GOTO lb3LABEL lb2 code3translate StmtStmt2LABEL lb3
return code1code2code3
lb1new label
lb2new label
lb3new label
code1LABEL lb1translate cond ExpExp, lb2, lb3 code2LABEL lb2translate StmtStmtGOTO lb1 return code1code2LABEL lb3
They are shown at the first two rows of Table 4. Note that write function takes only one argument, hence we rewrite the Args symbol directly to Exp.
Lets revisit the arguments passing order in the SPL runtime. Weve already mentioned 4.2 that, the arguments should be pushed into the stack in reverse order as they declared. In the translation scheme of Args, we accomplish this behavior by passing an additional arg list parameter. Each time a new argument is evaluated, its reference is inserted at the head of this list. Before inserting the CALL instruction, we pop these references from arg list and generate their corresponding ARG instructions. Here we simulate a firstinlastout list, hence the arguments for a callee function are evaluated from left to right, and they will be pushed into the runtime stack from righttoleft.
Its worth mentioning that the order of evaluation2 of C language is an undefined be havior. The function arguments are specified by commaexpression, and a compiler can evaluate each subexpression in an arbitrary order3. For example, to invoke fooab, ab,
2 https:en.cppreference.comwclanguageevalorder
3The order of evaluation is different to the concept of associativity: the comma operator is lefttoright associated, but its evaluation order is undefined
11

SUSTech CS323Compilers Project 3 Nov 19, 2019 Table 4: Function invocation translation schemes
translate ExpExp, placecase Exp of return READ place
tpnew place
return translate ExpExp, tpWRITE tp functionsymtab lookupID
return place : CALL function.name functionsymtab lookupID
arg listEMPTY LIST
code1translate ArgsArgs, arg list code2EMPTY CODE
for i1 to arg list.length:
code2code2ARG arg listi
return code1code2place : CALL function.name translate ArgsArgs, arg listcase Args of
read LP RP write LP Exp RP
ID LP RP
ID LP Args RP
Exp
Exp COMMA Args
tpnew place
codetranslate ExpExp, tp
arg listtparg list
return code
tpnew place
code1translate ExpExp, tp
arg listtparg list
code2translate ArgsArgs, arg list return code1code2
a, a compiler can generate code that evaluate from lefttoright, from righttoleft, or even evaluates ab first, then a, then ab. In the provided translation scheme, we define a left toright evaluation order for function arguments. However, it is up to you to generate code with another evaluation order. The only important matter to that the arguments should be passed in a proper order.
5.4 Derived Data Types
There are two derived date types in SPL: arrays and structures. Unlike integers, of which the width is fixed, we should dynamically allocate memory before using variables of derived types, by using DEC statement with the size of the declared type. Generally, derived data types can be treated as collections of elements. Accessing a particular element can be represented by the form baseoffset, where offset is calculated from the size of preceding elements. To use a derived type variable, there are two rewriting rules: ExpExp LB Exp RB and ExpExp DOT ID, both of them will be translated into memory offset calculation.
We adopt true multidimensioned array representation, which stores all elements in a
12

SUSTech CS323Compilers Project 3 Nov 19, 2019
single consecutive block. In an SPL array, all elements are equal in size, and the sizes of all arrays in the same lower dimension are also equal. So we can calculate the address of an element by assuming that the data type of the array element is T:
ADDRarri1 . . . inADDRarrOFFSET arri1 . . . inSIZE T
The OFFSET function does not involve the data size. It only depends on the number of preceding elements. Take 3dimensional array arrlmn as an example, the offset of element arrijk is:
OFFSET arriikimnjnk
The above formula can be easily generalized to arrays with higher dimensions.
For structure variable, the offset is calculated by the sum of preceding elements sizes. Though we assume all variables are integer, you can design your own way to represent other data types, such as character, or short or even floatingpoint number4. If the fields in a structure variable are of different types, we cannot calculate the offset and multiply it with
a constant size. The formula of addressing a particular field is:
i1 ADDRst.fiADDRst SIZEst.ft
t
However, you should consider the alignment. In the runtime, all structure fields should be aligned to 32bit boundary. To simplify the implementation, you can allocate all primitive types in 4 bytes.
There are more complex cases, such as accessing an array element, while the array is a field of a structure. With proper data type representation and program design pattern, dealing with such complex cases is not very difficult. Since you are not required to implement derived type variables, we will not give you the concrete translation schemes. Nevertheless, you can implement this feature to obtain bonus score.
5.5 Hints
We recommend you to separate the class of intermediate representation collection of TAC instructions. They can be organized by any form Section 3. Then, you may construct a sample TAC program artificially and run it on the IR simulator to verify your IR class runs correctly.
You need to implement translate X functions for nonterminals. We have introduced translation schemes for expressions and statements, and you should think about how to deal with other cases such as data initialization, structure member accessing, etc. Once you have understood and realized the translation schemes introduced previously, it should be simple to write new schemes. Writing more test cases by your own always helps find your compilers inadequacy.
4You need to convert them into integer, since the IR simulator only support integer
13

SUSTech CS323Compilers Project 3 Nov 19, 2019 6 Project Requirements
6.1 Basic Requirements
In the intermediatecode generation stage, the input format is the same as the previous project, i.e., the executable splc accepts a single command line argument representing the SPL program path. Our test cases are free from lexicalsyntaxsemantic errors, so your task is to generate TACIR that preserves the behavior of this program without considering reporting errors.
We already assume that the input test cases do not contain any sort of nonlogical error, though you are free to keep those checking routines. You dont need to consider the comments, hexadecimal form of integers or characters, or variable definition with nestedscopes, since they are not required in the previous projects. You should carefully read the assumptions in Section 6.2 to avoid getting stuck at overcomplicated situations such as structures with arrays as fields or arrays of structures, etc..
You should compile your IR generator as the splc target in Makefile, and move the executable to the bin directory. For example, the Makefile is placed under the project root directory, then we make the splc target by:
make splc
which generates the executable. Then we run it by:
binsplc testtest 3 r01.spl
The generator should print the corresponding TACIR code in text file testtest 3 r01.ir, and we will verify the generated IR using the provided IR simulator.
Unlike previous projects, this time we dont have optional requirements. There are only required and bonus implementations. However, your score for the required part will be subdivided into basic score and competitive score, as shown in Section 6.3. If your compiler can deal with cases beyond our requirements and generate IR according to the given TAC specification, and you provide your own test .spl files and report the bonus features in your report, we will give you bonus score based on the difficulty of design and implementation.
6.2 Assumptions
Here we present the assumptions for our test cases, which means that you can build your compiler safely by ignoring their violations. First of all, all tests are free of errors that should be caught by a lexicalsyntaxsemantic analyzer we suppose there are also no logical errors. The other assumptions are:
Assumption 1 there are no comment, hexadecimalform integers, floatingpoint num bers and characters either variables or constant literals
Assumption 2 there are no global variables, and all identifiers are unique Assumption 3 the only return data type for all functions is int
14

SUSTech CS323Compilers Project 3 Nov 19, 2019 Assumption 4 all functions are defined directly without declaration
Assumption 5 there are no structure variables or multidimensional dimension higher than one arrays
Assumption 6 functions parameters are never structures or arrays
6.3 Project Requirements
6.3.1 Basic requirements
Your compiler shall translate the input SPL program into the given TAC representation in Appendix A. To show how your compiler works, see the following SPL program:
Listing 1: SPL test case for intermediatecode generation
1 int main 2
3 int n;
4 nread;
5 if n0 write1;
6 else if n0 write1;
7 else write0;
8 return 0;
9
The code is a signum function, which outputs 1 for positive numbers, 1 for negative numbers, and 0 for zero. A valid intermediate representation of this program is:
FUNCTION main :
READ t1
v1 : t1
t2 : 0
IF v1t2 GOTO label1
GOTO label2
LABEL label1 :
t3 : 1
WRITE t3
GOTO label3
LABEL label2 :
t4 : 0
IF v1t4 GOTO label4
GOTO label5
LABEL label4 :
t5 : 1
t6 : 0t5
WRITE t6
GOTO label6
LABEL label5 :
15

SUSTech CS323Compilers
t7 : 0
WRITE t7
LABEL label6 :
LABEL label3 :
t8 : 0
RETURN t8
Project 3
Nov 19, 2019
Our sample output adopts the naming convention that, variable names follow the pattern tn or vn , and labeln for label names. However, this is not compulsive. Your compiler can generate any valid names as you wish.
Note that the above TAC IR is far from efficient. For example, it assigns constant 0 to four variables t2, t4, t7 and t8, while they can be the same local variable. Also, there are dummy labels like label6. You may build a compiler that generates more efficient code.
6.3.2 Competitive score
This time, as aforementioned, we will evaluate the code efficiency in terms of the number of executed instructions, and assign competitive score to you based on your compilers per formance. For example, the generated code in Section 6.3.1 will execute 10 instructions for positive inputs and 15 instructions for negative inputs. However, it can be optimized so that we could obtain the same result with fewer instructions.
We will evaluate your compilers from two aspects:passed tests,executed instructions. Given two compiler, the one that passes more test cases will be considered more superior and thus ranked higher. If two compilers pass the same number of tests, the one that executes fewer instructions will be ranked higher.
If your compiler is ranked at top 20, you will obtain 100 of the competitive score. If it is in the range 20, 40, you will obtain 80 of the score. If it is in the range 40, 60, you will obtain 60 of the score. If it is in the range 60, 80, you will obtain 40 of the score. If it is ranked at the last 20, you will still obtain 20 of the score.
6.3.3 Bonus implementation
You are encouraged to design and implement your own language features. However, to ensure that the generated code can run safely in the IR simulator, you should carefully design the translation schemes so that the extra features can be supported by the given TAC instructions. A suggested bonus feature is to support continuebreak statements in loops. You can easily extend SPL grammar by adding new terminals, and translate the constructs into some kinds of GOTO structures.
You can also implement your compiler by modifying the assumptions, for example:
modify Assumption 5 and Assumption 6, so that structure variables can appear in the program, and they can be declared as function parameters. Still, assignment operations will not be directly performed on a structure variable.
16

SUSTech CS323Compilers Project 3 Nov 19, 2019
modify Assumption 5 and Assumption 6, so that singledimensional arrays can be declared as function parameters, and multidimensional arrays can be defined as local variables
6.4 Grading Policy
We have a suite of test cases to evaluate your compiler. Some of the test cases will be released to you. The maximum score of this project, excluding the bonus part, is 100 points. Your score depends on which test cases your compiler can pass, the code efficiency, and the quality of your report. If your compiler passes all test cases, you will get 70 points, that is your total score of the basic requirement. The competitive score is 20 points, it is assigned based on the efficiency of the code that your compiler generated, as we aforementioned. The quality of your project report accounts for the remaining 10. There are no unique answer for the test code. Any generated IR that runs under expectation is correct.
Since we do not have preprepared test cases to evaluate the bonus features implemented by you, you are required to write your own test cases. The bonus points you can get depend on the difficulty of the features and how well you have implemented and tested them. The maximum bonus you can get for this project is 20 points.
7 Submission
What to submit You are required to submit your Cflexbison source code and other related files in a .zip archive with file name format: StudentIDprojectNumber.zip e.g., 11849180project3.zip. In this project, the zip file tree is:
StudentIDproject3
bin
splc generated
report
StudentIDproject3.pdf
test
test3r01.spl
test3r01.ir
our provided tests
testa.spl
testa.ir
testb.spl
testb.ir
your selfwritten tests
Makefile
CFlexBison source code
17

SUSTech CS323Compilers Project 3 Nov 19, 2019
bin directory contains a single executable file named splc, which is generated by an splc
target in the Makefile, make sure it works properly in our environment Section 2.
report directory contains a pdf file that illustrates your design and implementation, you should focus on the optimization and bonus features your have realized, since the basic requirement is rather simple and straightforward. Your report should not exceed 4 pages, we suggest you to use 11pt font size and singleline spacing for main content.
test directory is released in the GitHub repository, though you may add additional test code written by yourself. We do not constrain the naming convention on test code, except the name of the code file should end with .spl extension and the name of the generated code file should end with .ir.
The Makefile is provided. You can add any targets, but most importantly, you have to guarantee the splc target compiles and generates a single executable splc the analyzer in the bin directory. Otherwise, we cannot evaluate your work and you will get 0 for this project.
In this project, you will write code in CFlexBison, you can place them directly under the submitted directory, or under separate folders like src and include. Again, its your job to ensure the code can be compiled successfully, no matter where they are.
How to submit You should upload your zip file on SAKAI before the 11:55 PM, Decem ber 15, 2019. Late submissions are still allowed within 2 weeks after the deadline grace period. If you submit your code within 1 week after the deadline, your score will be 80 of the score you could get if the submission was made before the deadline. Later if you submit your project before the end of the grace period December 29, you can obtain only 60 of the original score. Code submitted after the grace period will not be graded, meaning that you will get a zero for this project.
Contact For any question regarding this project, feel free to contact the teaching assistant. You can send an email to 11849180mail.sustech.edu.cn with the subject:
CS323 Project 3 YourNameStudentID
e.g., CS323 Project 3 WangSinan11849180
8 Resources
Here we list some links that you may find useful information during this course.
C language tutorial: http:www.stat.cmu.edubriancprog.html
Simple Makefile tutorial: http:www.cs.colby.edumaxwellcoursestutorials maketutor
18

SUSTech CS323Compilers Project 3 Nov 19, 2019
Compiler tools FlexBison introduction: http:dinosaur.compilertools.net
FlexBison introduction: https:aquamentus.comflexbison.html
Stanford University CS143 Compilers: https:web.stanford.educlassarchive cscs143cs143.1128
GNU C Compiler InternalsGNU C Compiler Architecture: https:en.wikibooks. orgwikiGNUCCompilerInternalsGNUCCompilerArchitecture
Buffer Overflow ExploitDhaval Kapil: https:dhavalkapil.comblogsBufferOverflowExplo
19

SUSTech CS323Compilers Project 3 Nov 19, 2019 Appendix
A ThreeAddress Code Specification
Table 5 defines the instruction set of the threeaddress code used in our SPL compiler. Your generated TAC should strictly follow the specification, otherwise, our IR simulator cannot recognize your generated code. Two important things to note: 1 the empty space cannot be eliminated, particularly, there must be an empty space between the labelfunction name and the colon the first two rows; 2 all keywords LABEL, CALL, etc. must be in upper case, or they will be recognized as identifiers.
Instruction
LABEL x : FUNCTION f : x:y
Table 5: Threeaddresscode specification
Description
define a label x
define a function f
assign value of y to x
arithmetic addition
arithmetic subtraction
arithmetic multiplication
arithmetic division
assign address of y to x
assign value stored in address y to x copy value y to address x unconditional jump to label x
x :y x :y x :y x :y
x:y x:y x:y GOTO x
IF x relop y
RETURN x
z z z z
GOTO z if the condition binary boolean is true, jump to label z exit the current function and return value x
DEC x size
PARAM x
ARG x
x : CALL f
READ x
WRITE x
allocate space pointed by x, size must be a multiple of 4 declare a function parameter
pass argument x
call a function, assign the return value to x
read x from console
print the value of x to console
We explain the instructions in detail below:
Keyword LABEL defines a target, which can be jumped to by the GOTO statement
FUNCTION specifies a set of instructions as a function. The function definition ends when a new function is defined, or upon reaching endoffile
Assignment operation assigns the value on its right side rvalue to the left side symbol lvalue. An lvalue can only be a variable, while the rvalue can be either variable or immediate value. An immediate value is specified by the form n. For example, the TAC instruction t1 : 9 means assign constant value 9 to the variable t1
20

SUSTech CS323Compilers Project 3 Nov 19, 2019
There are four arithmetic operations in the TAC, addition, subtraction, multiplication,
division, the operands can be either variables or immediate values
Thesymbol stands for address of, which returns the address of a variable. For example, the instruction b : a8 means adding the address of a by 8, then assign the new address to b
If an dereference operatoris attached to a rightside variable in an assignment operation, it returns the value stored at that address. When x appears in the left side of an assignment statement, it means that the rvalue should be copied to the address x
There are two types of GOTO jump statements, unconditional jump and conditional jump. An unconditional jump to x will move the program counter to the line right after LABEL x. As for conditional jump, if the boolean expression evaluates to true, then the GOTO part executes, otherwise, the program counter moves to the next instruction. A relational operator relop can be any of the six: , , , , !,
RETURN statement moves the current program counter from a function to its caller, and returns a value x, which can be either a variable or an immediate value
The keyword DEC is designed for allocating consecutive memory. It is useful for derived type variables such as array or structure. The unit of size is byte, so an integer array of size 10 should take 40 bytes in memory. For those primitive type int variables, there is no need to allocate memory using DEC
The following three instructions are related to function invocation. PARAM is used for declaring function parameters, it follows the FUNCTION statement. For example, if function foo has three parameters, its declaration is translated into:
FUNCTION foo :
PARAM p1
PARAM p2
PARAM p3
To call this function, you should pass the arguments onebyone in reverse order, then apply the CALL statement:
ARG a3
ARG a2
ARG a1
t : CALL foo
READ and WRITE are designed for user interaction. In our IR simulator, READ statement can read a userinput integer from the console, and WRITE prints an integer value to the console
21

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] C data structure Scheme html MIPS python compiler operating system graph SUSTech CS323 Compilers Project 3 Nov 19, 2019 IntermediateCode Generation
$25