In this project, you will get familiar with the RUX86programming and build an interpreter for RUX86. RUX86will be the target for the later projects. This project alsointroduces more constructs in OCaml not used on Project 1.
Following are the files in the project2.tar.gz distribution
** assert.ml the assertion framework** gradedtests.ml graded test cases that we provide** main.ml the main test harness** rux86.mli the RUX86 interface** rux86.ml the RUX86 instruction set implementation** rux86interpreter.mli the interpreters interface** providedtests.ml your submitted test cases (Part I) should go** rux86interpreter.ml your interpreter code (Part II) should be written here** report.txt your experience report summary** RUX86-specification.txt RUX86 ISA Specification
(2) RUX86 Interpreter
RUX86 assembly code is organized into labeled blocks ofinstructions, which might be written in concrete syntax as shownbelow for the example discussed in Lecture 3.
square:pushl %ebpmovl %esp %ebpmovl 8(%ebp), %eaximull %eax, %eaxpopl %ebpret
f:pushl %ebpmovl %esp, %ebpmovl 12(%ebp), %eaxmovl 8(%ebp), %ebxsubl %ebx, %eaxpushl %eaxcall squareaddl $4, %esppopl %ebpret
program:pushl %ebpmovl %esp, %ebppush $12push $3call fadd $8, %esppopl %ebpret
This code has three blocks, labeled program, f, and square. The codeat labels f and square implement the following C code.
int square (int x){return x * x;}
int f (int x1, int x2){int z = x2 x1;return square(z);}
int program (){return f(3, 12);}
The above code has three functions: program, f and square. Thecode in program calls function f with immediate values 3 and9. When you run use the RUX86AssemblyAST.ml described in lecture 3, youcan obtain the assembly file, which can be compiled with the stub codeprovided to you in lecture 3.
In this project you will implement an interpreter for RUX86programs, but rather than using the concrete syntax shown above, youwill interpret programs written in an abstract syntax. For example, theprogram above represented with the RUX86 abstract syntax isshown below. The main structure is a list of insn_blocks, each ofwhich is just a record that has a lbl and a list of insn values.
[(mk_block f [Push ebp;Mov (esp, ebp);Mov (stack_offset 2, eax); (* x2 *)Mov (stack_offset 1, ebx); (* x1 *)Sub (ebx, eax);Push (eax);Call (Lbl (mk_lbl_named square));Add (Imm 4l, esp);Pop (ebp);Ret;]);(mk_block program [Push ebp;Mov (esp, ebp);Push (Imm 12l);Push (Imm 3l);Call (Lbl (mk_lbl_named f));Add (Imm 8l, esp);Pop ebp;Ret;]);(mk_block square [Push ebp;Mov (esp, ebp);Mov (stack_offset 1, eax);Imul (eax, Eax);Pop ebp;Ret;]);]
As you can see, the correspondence between the abstract syntax and theconcrete syntax is quite close. The file rux86.ml and itscorresponding interface rux86.mli together provide the basicdefinitions for the creating and manipulating RUX86 abstract syntax the main types you should be aware of are lbl, reg, operand, ccode,insn, and insn_block. Each of these corresponds pretty directly to aconcept from the RUX86 assembly language specification. Read theRUX86 specification for details about each instruction.
The RUX86 specification is written from the point of view ofactual X86 hardware. The interpreter you implement should be faithfulto that specification, except for a few points describe below.
The ML-level interpreters representation of the RUX86 machinestate is given by the this type:
type x86_state = {s_mem : int32 array; (* 1024 32-bit words the heap *)s_reg : int32 array; (* 8 32-bit words the register file *)mutable s_OF : bool; (* overflow flag *)mutable s_SF : bool; (* sign bit flag *)mutable s_ZF : bool; (* zero flag *)}
The memory and register files are simulated by OCaml-level (mutable)arrays of int32 values. The three condition flags are mutable booleanfields; all of the state is bundled together in a record (see IOCChapter 8.1 for more about OCamls record types). The main differencesbetween the interpreter and real X86 hardware are:
*** Heap size:
To facilitate debugging, and to use less memory for the simulation,our interpreter will use only 4096 bytes (1024 32-bit words) for itsheap size. The part of the heap simulated is the highest addressablememory locations in rux86interpreter.ml these locations arerepresented as mem_top and mem_bot.
You will need to complete the map_addr function found inrux86interpreter.ml that takes an object-level RUX86 address(i.e. a 32-bit address) and returns the meta-level int index into thes_memory array. See the documentation of this function inrux86interpreter.mli.
*** Code labels:-
Unlike the real X86 machine, our interpreter does not store programcode in the main memory (i.e. the s_memory array). Instead, it willtreat a list of insn_block values as the program to be executed, thecode labels associated with those instruction blocks are the onlylegal jump (and call) targets in the interpreter. A real X86 machinecan jump to an address calculated using arithmetic operations; doingso will cause the interpreter to throw an X86_segmentation_faultexception.
Treating labels abstractly presents a challenge for theinterpreter we dont have a concrete machine address to use for EIPthe instruction pointer, which real machines use to keep track ofthe address of the next instruction to execute. Instead, yourinterpreter will have to simulate the instruction pointersomehow. Doing so is fairly straightforward in most cases, however theCall instruction is supposed to push a return address (derived fromEIP) onto the program stack and Ret is supposed to pop a previouslypushed address from the stack and jump to it.
To simulate thisbehavior, your interpreter will need some auxiliary mechanism to trackthe stack of return addresses that would have been pushed byCall. Both Call and Ret instructions should still also manipulate theX86 stack pointer Esp. Push x00000000 to the stack for all Callinstructions in place of the actual EIP.
Because of this treatment of code labels and the EIP, your interpreterdoes not have to be 100% faithful to X86 semantics. In particular,programs that do not have well-balanced uses of Call and Ret areill-defined, as are programs that make exotic use of the stack(e.g. ones that try to manipulate a return-address that was pushed byCall by manually popping it from the stack or reading it directly frommemory).
For similar reasons, in this interpreter, it is invalid to try to geta word stored at a label (or to attempt to perform indirect accessusing a label).
** No Fall-through:
Another consequence of this abstract treatment ofcode labels is that your interpreter cannot make any assumptions abouthow code might be laid out in memory. Consider the following assemblyprogram:
label1:movl $1, %eaxlabel2:movl $2, %ebx
If executed on a real X86 machine starting from label1, the machinewill first move 1 into EAX and then fall through to the nextinstruction, which moves 2 into EBX. In your interpreter, this codemight be represented as:
[(mk_insn_block (mk_lbl_named label1) [Mov(eax, Imm 1l);]);(mk_insn_block (mk_lbl_named label2) [Mov(ebx, Imm 2l);]);]
Because the labels are abstract, the order of insn_blocks could havebeen switched the interpreter doesnt know about the order in whichcode blocks might be laid out in memory, so it cant simulate fallthrough.
[(mk_insn_block (mk_lbl_named label2) [Mov(ebx, Imm 2l);]);(mk_insn_block (mk_lbl_named label1) [Mov(eax, Imm 1l);]);]
Instead, your interpreter should raise a X86_segmentation_faultexception if the program would ever need to fall through the end of aninsn_block that is, both representations of the program above, ifstarted at label1 would result in such an exception. We call aninsn_block valid if the last instruction of the block is either a Retor some form of jump instruction. All of the functionality tests weprovide will use programs whose blocks are all valid. For example, thecode for factorial given above is valid because each of its blocksends in a Ret instruction. In general it is always possible to writeprograms in this form (you can always jump to the label that wouldotherwise be reached by falling through the end of a block).
Tasks=====
Complete the implementation of the rux86interpreter.mli interface inthe rux86interpreter.ml file, some parts of which are given to you.
We recommend that you do things in this order:
(1) As a warm-up, implement the map_addr function.
(2) As an exercise in condition codes, implement the condition_matchesfunctions.
(3) Then implement the interpretation of operands (includingindirect addresses), since this functionality will be needed forsimulating instructions. Then work on the instructions themselves,perhaps starting with developing a strategy for simulating Call andRet. For this, you might want to implement an auxiliary function thattakes extra state, and you might want to define extra functions thatare mutually recursive with interpret. There is more than one way toimplement the desired Call and Ret behavior; somehow youll have tosimulate their expected semantics.
Hints:
We have provided several helper functions for 32- and 64-bitarithmetic and for getting a bit out of a 32-bit word. You might findthem useful. Youll probably want a function that sets the threecondition flags after a result has been computed. Groups ofinstructions share common behavior for example, all of thearithmetic instructions are quite similar. You probably want to factorout the commonality as much as you can (to keep your code clean). Youwill probably want to develop small test cases to try out thefunctionality of your interpreter. See gradedtests.ml for someexamples of how to set up tests that can look at the final state ofthe machine.
Tests=====
We will grade this part of the assignment based on a suite oftests. Some of them are available for you to look at ingradedtests.ml, the rest of them we reserve for our own cases. We willalso stress-test your interpreter on a number of big programs (seePart II) that we have developed and that your classmates will developas part of this project.
You can add your test cases to the test suite defined inprovidedtests.ml
(2) RUX86 Programming=====================
For this part of the assignment, you will create (by hand) anon-trivial RUX86 assembly program to test your interpretersbehavior and get some experience programming in RUX86. The factorialprogram supplied with the test code is an example of what we mean bynon-trivial test cases of roughly this level of difficulty canearn full credit. In particular, your program should include:Non-trivial control flow: either nested loops, a recursive function,or something similarly complex At least one conditional branch. Someamount of arithmetic or logic. A test case (or test cases) thatexercise your program and test the correctness of the interpretersbehavior. If your program computes a simple answer, it can return itin Eax and you can use the run_test harness as in the factorialexample; for more complex programs you might want to use the st_testfunction, which lets you examine the resulting state of theinterpreter. Some good candidates for such programs include: simplesorting or searching algorithms (i.e. treat some chunk of memory as anarray of values to be sorted), simple arithmetic algorithms such asgcd, recursive functions over integers or very simple data structuressuch as linked lists. If you are unsure whether the test case youdlike to implement is sufficient, contact us.
Your test should be a function of type unit unit that works in theassertion framework (as defined in assert.ml). This test should besupplied in providedtests.ml as the Student-Provided Big Test forPart II. We will hand grade this submission (and test it against ourown interpreter). We will also use all correct submitted tests tovalidate all of the other projects in the course the trickier yourtest is, the harder it will be for other teams to pass it!
(3) Assignment submission
Most of the instructions for this project are found as comments in thesource files. For this assignment, you should primarily fill therux86interpreter.ml and providedtests.ml and submit your directory asassign2.tar.gz directory.
When we execute the following set of commands on a Linux box, yoursubmission should execute the tests. If you create the tar file in thewrong fashion, it may not have the correct directory structure.
tar -zxvf assign2.tar.gzcd assign2./ocamlbuild main.native./main.native test
Instructions for creating the tar file. Lets say all your assign2materials are in the assign2 directory
#lsassign2
#tar -c assign2 assign2.tar
#gzip assign2.tar
The last command should create assign2.tar.gz
(4) Grading
You will get no credit if your submitted program does not compile.Apart from the test already given, we will run some hidden tests. Yourpoints will be 50% for passing the given tests (with correctly writtencode) and 50% for the hidden tests. Your score for the given testswill be displayed by the test harness.
Reviews
There are no reviews yet.