Making the LC4 Assembler Instructions
Contents
Assignment Overview 3
Learning Objectives 3
Advice 3
Getting Started 4
Codio Setup 4
Starter Code 4
Object File Format Refresher 4
Requirements 5
General Requirements 5
Assembler 5
assembler.c: main 5
asm_parser.c: read_asm_file 6
asm_parser.c: parse_instruction 6
asm_parser.c: parse_add 6
asm_parser.c: parse_xxx 7
asm_parser.c: str_to_bin 7
asm_parser.c: write_obj_file 7
Extra Credit 8
Suggested Approach 8
High Level Overview 8
Great High Level Overview, but I really need a Slightly More Detailed Overview 10
Part 0: Setup the main Function to Read the Arguments 10
Part 1: Read the .asm File 10
Part 2: Parse an Instruction 13
Part 3: Parse an ADD Instruction 15
Part 4: Converting the binary string to an hexadecimal formatted integer 16
Part 5: Writing the .obj object file 17
Testing 18
Validate Output with PennSim 18
Files for Testing 18
Unit Testing 19
GDB for Debugging 19
Submission 20
Submission Checks 20
The Actual Submission 20
Grading 21
Assembler 21
Extra Credit 21
FAQ 23
Quick Hints 23
Formatting 23
Endianness 23
Resources 24
Assignment Overview
C files fall into two categories: “text” and “binary”. In this assignment you’ll work with both types by reading in a text file and writing out a binary file.
You will read an arbitrary .asm file (a text file intended to be read by PennSim) and write a .obj file (the same type of binary file that PennSim would write out).
Aside from reading and writing out the files, your task will be to make a mini-LC4- Assembler! An assembler is a program that reads in assembly language and generates its machine equivalent.
This assignment will require a bit more programming rigor than we’ve had thus far, but now that you’ve gained a good amount of programming skill in this class and in others, it is the perfect time to tackle a large programming assignment (which is why the instructions are so many pages).
Learning Objectives
This assignment will cover the following topics:
● Review the LC4 Object File Format
● Read text files and process binary files
● Assemble LC4 programs into executable object files
● Use debugging tools such as GDB
Advice
● Start early
● Ask for help early
● Do not try to do it all in one day
Getting Started
Codio Setup
Open the Codio assignment via Canvas. This is necessary to link the two systems.
You will see many files. At the top-level workspace directory, the main files are asm_parser.h, asm_parser.c, assembler.c, and PennSim.jar.
Do not modify any of the directories or any file in any of the directories.
Starter Code
We have provided a basic framework and several function definitions that you must implement.
assembler.c – must contain your main function.
asm_parser.c – must contain your asm_parser functions.
asm_parser.h – must contain the definition for ROWS and COLS
– must contain function declarations for read_asm_file, parse_instruction, parse_reg, parse_add, parse_mul, str_to_bin, write_obj_file, and any helper function you implement in asm_parser.c
test1.asm – example assembly file
PennSim.jar – a copy of PennSim to check your assembler
Object File Format Refresher
The following is the format for the binary .obj files created by PennSim from your .asm files. It represents the contents of memory (both program and data) for your assembled LC-4 Assembly programs. In a .obj file, there are 3 basic sections indicated by 3 header “types” = Code , Data, and Symbol:
● Code: 3-word header (xCADE, <address>, <n>), n-word body comprising the instructions.
○ This corresponds to the .CODE directive in assembly.
● Data: 3-word header (xDADA, <address>, <n>), n-word body comprising the initial data values.
○ This corresponds to the .DATA directive in assembly.
● Symbol: 3-word header (xC3B7, <address>, <n>), n-character body comprising the symbol string. These are generated when you create labels (such as “END”) in assembly. Each symbol is its own section.
○ Each character in the file is 1 byte, not 2 bytes.
○ There is no NULL terminator.
Requirements
General Requirements
● You MUST NOT change the filenames of any file provided to you in the starter code.
● You MUST NOT change the function declarations of any function provided to you in the starter code.
● Your program MUST compile when running the command make.
● You MUST NOT have any compile-time errors or warnings.
● You MUST remove or comment out all debugging print statements before submitting.
● You MUST NOT use externs or global variables.
● You SHOULD comment your code since this is a programming best practice.
● Your program MUST be able to handle .asm files that PennSim would successfully assemble. We will not be testing with invalid .asm files.
● Your program MUST NOT crash/segmentation fault.
● You MUST provide a makefile with the following targets:
○ assembler
○ asm_parser.o
○ all, clean, clobber
Assembler
assembler.c: main
● You MUST not change the first four instructions already provided.
● The main function:
○ MUST read the arguments provided to the program.
■ the user will use your program like this:
./assembler test1.asm
○ MUST store the first argument into filename.
○ MUST print an error1 message if the user has not provided an input filename.
○ MUST call read_asm_file to populate program[][].
○ MUST parse each instruction in program[][] and store the binary string equivalent into program_bin_str[][].
○ MUST convert each binary string into an integer (which MUST have the correct value when formatted with “0x%X”) and store the value into program_bin[].
○ MUST write out the program into a .obj object file which MUST be loadable by PennSim’s ld command.
asm_parser.c: read_asm_file
This function reads the user file.
● It SHOULD return an error2 message if there is any error opening or reading the file.
● It MUST read the exact contents of the file into memory, and it MUST remove any newline characters present in the file.
● It MUST work for files that have an empty line at the end and also for files that end on an instruction (i.e. do not assume there will always be an empty line at the end of the file).
● It MUST return 0 on success, and it MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 2 on failure).
asm_parser.c: parse_instruction
This function parses a single instruction and determines the binary string equivalent.
● It SHOULD use strtok to tokenize the instruction, using spaces and commas as the delimiters.
● It MUST determine the instruction function and call the appropriate parse_xxx helper function.
● It MUST parse ADD, MUL, SUB, DIV, AND, OR, XOR instructions.
○ It MUST parse ADD IMM and AND IMM if attempting that extra credit.
● It MUST return 0 on success, and it MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 3 on failure).
asm_parser.c: parse_add
This function parses an ADD instruction and provides the binary string equivalent.
● It MUST consider the first argument to be the full instruction.
● It MUST correctly update the opcode, sub-opcode, and register fields following the LC4 ISA.
● It SHOULD call a helper function parse_reg, but we will not be testing this function.
● It MUST return 0 on success, and it MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 4 on failure).
asm_parser.c: parse_xxx
You MUST create a helper function similar to parse_add for the other instruction functions required in parse_instruction.
● They MUST consider the first argument to be the full instruction.
● They MUST correctly update the opcode, sub-opcode, and register fields following the LC4 ISA.
● They SHOULD call a helper function parse_reg, but we will not be testing this function.
● They MUST return 0 on success, and they MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return a unique error number on failure).
asm_parser.c: str_to_bin
This function converts a C string containing 1s and 0s into an unsigned short integer
● It MUST correctly convert the binary string to an unsigned short int which can be verified using the “0x%X” format.
● It SHOULD use strtol to do the conversion.
asm_parser.c: write_obj_file
This function writes the program, in integer format, as a LC4 object file using the LC4 binary format.
● It MUST create and write an empty file if the input file is empty ● It MUST change the extension of the input file to .obj.
● It MUST use the default starting address 0x0000 unless you are attempting the .ADDR extra credit.
● It MUST close the file with fclose.
● It MUST return 0 on success, and they MUST return a non-zero number in the case of failure (it SHOULD print a useful error message and return 7 on failure).
● The generated file MUST load into PennSim (and you MUST check this before submitting), and the contents MUST match the .asm assembly program
Extra Credit
Option 1: modify your read_asm_file function to ignore comments in .asm files. You MUST handle all types of comments for credit.
Option 2: modify your program to handle ADD IMM and AND IMM instructions. Both MUST work completely for credit.
Option 3: modify your program to handle the .CODE and .ADDR directives.
Option 4: modify your program to handle the .DATA, .ADDR, and .FILL directives.
Suggested Approach
This is a suggested approach. You are not required to follow this approach as long as you follow all of the other requirements.
High Level Overview
Follow these high-level steps and debug thoroughly before moving on to the next.
1. Initialize all arrays to zero or ”
2. Call read_asm_file to read the entire .asm file into the array program[][].
a. Using test1.asm as an example, after read_asm_file returns: program[][] should then contain:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 ‘A’ ‘D’ ‘D’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘0’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
1 ‘M’ ‘U’ ‘L’ ‘ ‘ ‘R’ ‘2’ ‘,’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
2 ‘S’ ‘U’ ‘B’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
3 ‘D’ ‘I ‘V’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ”
4 ‘A’ ‘N’ ‘D’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘2’ ‘,’ ‘ ‘ ‘R’ ‘3’ ”
5 ‘O’ ‘R’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ” X
6 ‘X’ ‘O’ ‘R’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ”
7 ” X X X X X X X X X X X X X X
3. In a loop, for each row X in program[][]:
a. Call parse_instruction, passing it the current row in program[X][] as input to parse_instruction. When parse_instruction returns,
program_bin_str[X][] should be updated to have the binary equivalent (in string form).
b. Call str_to_bin passing program_bin_str[X][] to it. When str_to_bin returns, program_bin[X] should be updated to have the hexadecimal equivalent of the binary string from program_bin_str[X].
4. Once the loop is complete program_bin_str[][] should contain program[][] equivalent:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 ‘0’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘1’ ”
1 ‘0’ ‘0’ ‘0’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ”
2 ‘0’ ‘0’ ‘0’ ‘1’ ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ 0 ‘0’ ‘0’ ‘1’ ”
3 ‘0’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘0’ ”
4 ‘0’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ ”
5 ‘0’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ”
6 ‘0’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘0’ ”
7 ” X X X X X X X X X X X X X X X X
5. Also after the loop is complete, the array program_bin[] should contain program_bin_str[][]’s equivalent in binary (formatted in hexadecimal here):
0 0x1201
1 0x1449
2 0x1691
3 0x12DA
4 0x5283
5 0x52D2
6 0x52DA
program_bin[] now represents the completely assembled program.
6. Write out the .obj file in binary using the LC4 Object File Format.
Great High Level Overview, but I really need a Slightly More Detailed Overview
Okay, I guess we can give some more details.
Part 0: Setup the main Function to Read the Arguments
Open assembler.c from the helper files; it contains the main function for the program.
Carefully examine the variables at the top:
char* filename = NULL ; char program [ROWS][COLS] ; char program_bin_str [ROWS][17] ; unsigned short int program_bin [ROWS] ;
The first pointer variable filename is a pointer to a string that contains the text file you’ll be reading. Your program must take in as an argument the name of a .asm file. As an example, once you compile your main program, you would execute it as follows:
./assembler test1.asm
In the last assignment you learned how to use the arguments passed into main. So the first thing to implement is to check argc to see if the program has received any arguments. If it does, point filename to the argument that contains the passed in string that is the file’s name. You should return from main immediately after printing an error message if the caller doesn’t provide an input file name. For example, something like this:
error1: usage: ./assembler <assembly_file>.asm
Start by updating assembler.c to read in the arguments and store the filename. Compile your changes and test them before continuing.
Part 1: Read the .asm File
The next thing to do is to actually read the file into memory. main’s next call will be
int read_asm_file (char* filename, char program [ROWS][COLS] ) ;
The purpose of read_asm_file is to open the .asm file, and place its contents into the 2D array program[][]. You must complete the implementation of this function in the provided helper file asm_parser.c.
Notice that it takes in the pointer to the filename that you’ll open in this function. It also takes in the two dimensional array, program, that was defined back in main.
You’ll also see that ROWS and COLS are two #define’d macros in asm_parser.h. ROWS is set to 100 and COLS is set to 255. This means that you can only read in a program that is up to 100 lines long and each line of this program can be no longer than 255. When the program compiles, the compiler will replace all instances of ROWS with 100 and all instances of COLS with 255. This means you can #define these values once to avoid Magic Numbers and simplify your program.
You’ll want to look at the class notes (or a C reference textbook) to use fopen to open the filename that has been passed in. Then you’ll want to use a function like fgets to read each line of the .asm file into the program[][] 2D array. Be aware that fgets will keep carriage returns (aka the newline character) and you’ll need to strip these from the input.
Take a look at test1.asm file that was included in the helper file. It contains the following program:
ADD R1, R0, R1
MUL R2, R1, R1
SUB R3, R2, R1
DIV R1, R3, R2
AND R1, R2, R3
OR R1, R3, R2
XOR R1, R3, R2
After you complete read_asm_file and run it on test1.asm, your 2D array program[][] would contain the contents of the .asm file in this order:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 ‘A’ ‘D’ ‘D’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘0’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
1 ‘M’ ‘U’ ‘L’ ‘ ‘ ‘R’ ‘2’ ‘,’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
2 ‘S’ ‘U’ ‘B’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
3 ‘D’ ‘I ‘V’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ”
4 ‘A’ ‘N’ ‘D’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘2’ ‘,’ ‘ ‘ ‘R’ ‘3’ ”
5 ‘O’ ‘R’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ” X
6 ‘X’ ‘O’ ‘R’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘3’ ‘,’ ‘ ‘ ‘R’ ‘2’ ”
7 ” X X X X X X X X X X X X X X
Notice there are no newline characters at the end of these lines.
If reading in the file is a success, return 0 from the function. If not, return 2 from the function and print an error to the screen:
error2: read_asm_file failed
Implement and test this function carefully before continuing on with the assignment.
Part 2: Parse an Instruction
You only need to parse the following instructions: ADD, MUL, SUB, DIV, AND, OR, XOR. You do not need to implement AND IMM or AND IMM unless you want to attempt the extra credit.
Once read_asm_file is working properly, go back in main, and call parse_instruction, which is also located in asm_parser.c:
int parse_instruction (char* instr, char* instr_bin_str) ;
Purpose, Arguments, and Return Value
The purpose of this function is to take in a single row of your program[][] array and convert to its binary equivalent in text form (as a string of 1s and 0s). The argument instr must point to a row in main’s 2D array program[][]. The argument instr_bin_str must point to the corresponding row in main’s 2D array program_bin_str[][].
If there no errors are encountered the function will return a 0 and if any error occurs in this function it should print an error message such as:
error3: parse_instruction failed
return the number 3 immediately.
Let’s assume you’ve called parse_instruction and instr points to the first row in your program[][] array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
*instr ‘A’ ‘D’ ‘D’ ‘ ‘ ‘R’ ‘1’ ‘,’ ‘ ‘ ‘R’ ‘0’ ‘,’ ‘ ‘ ‘R’ ‘1’ ”
parse_instruction needs to examine this string and convert it into a binary equivalent. You’ll
need to use the LC4 ISA to determine the binary equivalent of an instruction. When your function returns, the memory pointed to by instr_bin_str, should look like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
*instr_bin_str ‘0’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ ‘0’ ‘0’ ‘0
‘ ‘0
‘ ‘0’ ‘0’ ‘0’ ‘0’ ‘1’ ”
Notice this isn’t actually binary, but it is the ADD instruction’s binary equivalent in text (C String) form. We will convert this string form of the binary instruction to an integer (hexadecimal) later.
How to implement this function
The purpose of converting the instruction to a binary string (instead of to the binary number it will eventually become), is so that you can build this string up little by little.
Investigate the strtok function in the standard C string library if you haven’t already done so for the last assignment.
strtok allows you to parse a string that is separated by delimiters. In this function you’ll be parsing the string pointed to by instr and you’ll be building up the string pointed to by instr_bin_str. instr will contain spaces and commas (those will be your delimiters).
Your first call to strtok on the instr string should return back the instruction function: ADD, SUB, MUL, DIV, XOR, etc. The only thing common to all 26 instructions in the ISA is that the very first part of them is the instruction function (e.g. ADD). Once you determine the instruction function, you’ll call the appropriate helper function to parse the remainder of the instruction.
As an example, let’s say the instruction function is ADD. Once you’ve determined the instruction function is ADD, you would call the parse_add helper function. It will take the instruction instr as an argument, but also the instr_bin_str string because parse_add will be responsible for determining the binary equivalent for the ADD instruction you are currently working on and it will update instr_bin_str.
int parse_add (char* instr, char* instr_bin_str ) ;
When parse_add returns, and if no errors occurred during parsing the ADD instruction, instr_bin should now be complete. At this time, you can return 0 from parse_instruction. If you encounter any errors in this function, you should print an error3 message and return 3.
This is only the first instruction. main will need to do this for each row of program[][], using strtok to get the instruction function, calling the appropriate parse_xxx helper function to finish the instruction, and updating instr_bin_str appropriately.
Part 3: Parse an ADD Instruction
This function is specific to parsing the ADD instruction, but you will need to write a similar function for each of the different instruction functions.
The helper function parse_add should be called only by the parse_instruction function. It has two char* arguments: instr and instr_bin_str.
int parse_add (char* instr, char* instr_bin_str ) ;
Because this function will only be called when parse_instruction encounters an ADD instruction function, instr will contain an ADD instruction and instr_bin_str should be empty.
Similar to the other functions, if this function encounters no errors it will return 0 and if any error occurs it should return 4 after printing an error4 message
error4: parse_add() failed
The purpose of this function is to populate instr_bin_str. Upon the function’s start, the binary opcode can be immediately copied into instr_bin_str[0:3]. Afterwards, strtok can tokenize the remaining string to separate out the registers RD, RS, and RT, from the instr string.
For each register, call the parse_reg helper function:
int parse_reg (char reg_num, char* instr_bin_str) ;
This function must take a number in character form and populate instr_bin_str with the appropriate corresponding binary number. For example, if RD = R0 for the ADD instruction, the ‘0’ character would be passed in the argument reg_num. parse_reg then copies the characters 000 into instr_bin_str[4:6].
parse_reg should return 5 if any errors occur after printing a standard error5 message; otherwise it returns 0 upon success.
To implement the parse_reg function, consider using a switch() statement:
This helper function should only parse one register at a time. Also, because it is not specific to the ADD instruction (nearly all instructions contain registers), you can call it from other functions that need their registers converted to binary. Example: parse_mul should also call parse_reg.
Note that parse_add must also populate the sub-opcode field in instr_bin_str[10:12]. When parse_add returns, instr_bin_str should be complete. parse_instrunction should then return to main.
You will need to create a helper function for each instruction type, use parse_add as a model. As an example, you’ll need to create parse_mul, parse_xor, etc. They will all be very similar functions, so perfect parse_add before you attempt the other functions.
Part 4: Converting the binary string to an hexadecimal formatted integer
After parse_instruction returns successfully to main, main should call str_to_bin:
unsigned short int str_to_bin (char* instr_bin_str) ;
This function should be passed the recently parsed binary string from the array program_bin_str[X], where X represents the binary instruction that was just populated by the last call to parse_instruction.
The purpose of this function is to take a binary string and convert it to a 16-bit binary equivalent and return it to the calling function. To implement this function, we recommend using strtol. If strtol returns 0, print an error6 message and return 6.
As an example of what this function should do, if it was called with the following argument:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
*instr_bin_str ‘0’ ‘0’ ‘0 ‘ ‘1’ ‘0 ‘ ‘0’ ‘1’ ‘0’ ‘0 ‘ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘0’ ‘1 ‘ ”
then it should return: 0x1201, which is the hexadecimal equivalent for this binary string. You can verify and print out what it returns by using printf(“0x%X”), which will print out integers in hexadecimal format.
Part 5: Writing the .obj object file
3-word header (xCADE, <address>, <n>), n-word body comprising the instructions. This corresponds to the .CODE directive in assembly.
Given this information, the last function to implement is:
int write_obj_file (char* filename, unsigned short int program_bin[ROWS] ) ;
The purpose of this function is to take the assembled program, represented in hexadecimal in program_bin[] and output it to a file with the extension: .obj. It must encode the file using the .obj file format specified in class. If test1.asm was pointed to by filename, your program would open up a file to write to called: test1.obj.
This function should do the following:
1. Take the filename passed in the argument filename, change the last 3 letters to “obj”
2. Open up that file for writing and in binary format. The file you’ll create is not a text file, these are not C Strings you’re writing, they are binary numbers.
3. Write out the first word in the header: 0xCADE.
4. Write out the address your program should be loaded at. 0x0000 is the default.
5. Count the number of rows that contain data in program_bin[], then write out <n> 6. Now that the header is complete, write out the <n> rows of data in program_bin[]
7. Close the file using fclose.
If any errors occur, print an appropriate error message and return 7. Otherwise return 0 and main should then return 0 to the caller. Your program is now complete.
Testing
Validate Output with PennSim
Once you have successfully written an object file from an assembly file, examine the .obj file’s contents using the Linux utility hexdump. From the Linux terminal prompt type:
hexdump test1.obj
hexdump will show you the binary contents. Make certain it matches your expectations!
As an example, for the program described in the Suggested Approach, the expected hexdump would be:
0000000 deca 0000 0700 0112 4914 9116 da12 8352
0000010 d252 da52
0000014
You must test your .obj files in PennSim before submission. If they fail to load, you should expect little, if any, credit for this assignment.
It is your responsibility to test out files other than test1.asm. Also, you must test your .obj files by loading them into PennSim and seeing if they work! Please do this before submitting your work. We will be testing your programs with different .asm files, so you should try out different .asm files of your own.
Files for Testing
We are only providing test1.asm for testing. However, you can (and should) create additional files that test different parts of the program.
For these test files, bring up PennSim, assemble it, and check the .obj file contents with hexdump. Then read it into your program and see if you can assemble it into the same object file. You can create a bunch of test cases very easily with PennSim.
You should test your assembler program on a variety of .asm files, not just simple examples.
Unit Testing
When writing such a large program, it is a good strategy to “unit test.” This means, as you create a small bit of working code, compile it, and create a simple test for it.
DO NOT write the entire program, compile it, and then start testing it. You will never resolve all of your errors this way. You need to unit test your program as you go along or it will be impossible to debug.
GDB for Debugging
gdb allows you to inspect the actual contents of memory which is an advantage over print statements because print statements only print ASCII characters. Further, you can see the actual contents of memory of any variable at any time, while print statements only print when you call the print statement during the execution of your program.
Reminder: you will need to add the -g flag to all intermediate compilation steps, not just the assembler target, and you will need to use the –args command to tell gdb that you have arguments to your program:
gdb -q -tui –args ./assembler test1.asm
Submission
Submission Checks
There is a single “submission check” test that runs once you upload your code to Gradescope. This test checks that you have submitted all four required files and also that your program compiles and any autograder code compiles successfully. It does not run your program or provide any input on whether it works or not. This check just ensures that all the required components exist. This test is performed after uploading to Gradescope.
The Actual Submission
You will submit this assignment to Gradescope in the assignment entitled Assignment 11: File I/O, Making the LC4 Assembler.
Download all of your .c source and .h header files and your Makefile from Codio to your computer, then Upload all four of these files to the Gradescope assignment.
You should not submit any of the provided or your own .asm testing files.
We will only grade the last submission uploaded.
Grading
We will only grade the last submission, regardless of the results of any previous submission.
We will not be providing partial credit for autograder tests.
Assembler
We do provide one example that we will test with, so you can be sure to get those points. You will have to figure out the rest yourself.
This assignment is worth 200 points, normalized to 100% for gradebook purposes.
20 points: correct makefile
30 points: general code inspection (manually graded)
10 points: handling command line arguments and writing the correct file
20 points: correctly handle endianness
60 points: correctly processing test1.asm (which we provide to you)
60 points: correctly processing our other test files (which we do not provide to you)
Extra Credit
The Extra Credit is worth 11 percentage points so the highest grade on the assignment is 111%.
Your extra credit must not break functionality for the non-extra credit requirements. Make a backup of your finalized program before attempting the extra credit. If your program fails to meet the basic requirements, you will end up losing more points than the extra credit will gain.
There is no partial credit. It must work completely for any credit.
We will not give guidance on how to do these since they are designed to be challenge problems.
2 percentage points: modify your read_asm_file function to ignore comments in .asm files. You must handle all types of comments for credit.
2 percentage points: modify your program to handle ADD IMM and AND IMM instructions. Both must work completely for credit (no partial credit for one instruction).
5 percentage points: modify your program to handle the .CODE and .ADDR directives. As a hint, you will need another array to hold the addresses, e.g. unsigned short int address[ROWS].
2 percentage points: modify your program to handle the .DATA directive.
FAQ
Quick Hints
● You are allowed to use the switch statement, a compact way to handle long if/then blocks.
● We won’t be testing with string literals in the unit testing.
● You do not need a script file, but you can certainly add one to automate your own testing.
● We will only be testing with valid .asm files. That is, all the test files will assemble correctly in PennSim.
● You can raise an error if the register number for Rx is not valid (e.g. R8), but again, we will not be testing with invalid files.
Formatting
● We will not test with blank lines between instructions, even though PennSim can assemble these without error.
● We do not expect you to use a regex to check if the instruction matches a format.
● Lines can end with trailing spaces, a newline, or just EOF (end of file) if it is the last line in the .asm file.
● strtok is sufficient to break the instruction into the different parts (hint: use a delimiter of ” ,”). The Assignment 10 instructions has a link to a good resource.
● All characters will be uppercase, except for x for hexadecimal values (which only applies to some of the extra credit challenges and they will never be X), even though PennSim can assemble these without error.
Endianness
● The x86 (the processor used by Codio) has a different endianness than the LC4. When doing fread()’s of 2 byte words, swapping occurs to adjust for this. That same swapping doesn’t occur with the fgetc() or fread()’s with size 1.
● If you read the .obj file into memory one word at a time using fread(), you will need to swap for endianness. In contrast, if you choose to read the .obj file into memory one byte at a time with fgetc(), the endianness doesn’t need to be adjusted. However, you will have to combine two bytes into a word using bitwise operators.
Resources
● strtok reference https://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
● switch statement reference https://www.tutorialspoint.com/cprogramming/switch_statement_in_c.htm
● strtol reference https://www.tutorialspoint.com/c_standard_library/c_function_strtol.htm
Reviews
There are no reviews yet.