Home
Schedule
Lecture notes
PLP CS
Other resources
Policies
Grading
Assignments
Piazza
Blackboardannouncementsgrades
Cross-Indexing Assignment
You may have noticed that debuggers (e.g., gdb) do a remarkably good job of understanding source-code-level information entered from the keyboard. Among other things, they know which declaration of a function, variable, or type is live at each point in a program. This is because the compiler (when invoked with appropriate command-line switches) includes symbol table information in its assembly language output. In the current assignment you will leverage this information to implement a web-based cross-reference tool.
Gcc is capable of producing symbol table information in a variety of formats. One of the most comprehensive of theseand the one used by default on Linuxis known as DWARF. It captures details about variables, functions, typedefs, etc., including the locations in the source code (file and the line number), at which they are declared. To embed DWARF information in your object file, compile with
gcc -g3 -o myprogram myfile1.c myfile2.c
The g3 switch instructs gcc to include detailed debugging information, including macro definitions, in the symbol table.
After compilation, DWARF information can be extracted from an executable or object file in any of several ways. Arguably the easiest is to use the llvm-dwarfdump tool, which you can find in usr/bin/ on the csug machines.
Your task in this assignment is to write a scripting programcall it xrefthat uses the output of llvm-dwarfdump and objdumpd (also found in usr/bin/) to construct a web page that contains side-by-side assembly language and corresponding source code for a given program. The assembly and source should be lined up nicely on the page: the first instruction in each contiguous block of instructions that come from the same line of source should be horizontally aligned with a copy of that source line. Source lines without corresponding assembly instructions (e.g., comments and declarations) should be presented immediately above the first occurrence of the following source line, or at the bottom of the page if there are no more lines with corresponding assembly code. In the rare case that a single assembly instruction is generated for multiple lines of source, the instruction should line up with the final source line. Source for in-lined functions should be displayed in-line, even if it comes from a different source file than do the surrounding lines. Note that some source lines (e.g., loop headers) may contribute to two or more non-contiguous blocks of instructions; in this case, the source line may appear more than once on your web page. For the sake of clarity, you should print the second and subsequent occurrences in a grayed-out color. Vertical white space should be inserted as needed to make the alignment work out.
If the above instructions seem a little vague, thats because they are. This assignment is underspecified. Part of your job is to exercise good judgment and to build a tool that does a reasonable job most of the time. It probably wont be perfect.
Note that your task is in some sense assembly centric: you are to display assembly-language instructions, in address order, and show next to them the corresponding source. In many cases the instructions for a given functionor even an entire file of source codewill be contiguous, but this may not always be the case: compilers are free to reorder blocks of instructionseven to the point of interleaving code from different functions or files. I recommend you keep track of which files contain source that corresponds to some instruction(s) in the assembly code. Within each such file, you can then determine which lines correspond to some assembly. This in turn will allow you to identify the source lines that do not correspond to any assembly, and should therefore be output immediately before the first occurrence of the next line. For extra credit (see below) you are welcome to try formatting your output in a more source-centric way, but this is a significantly more difficult (and more poorly defined) undertaking.
In addition to displaying side-by-side assembly and source, you should arrange for every fixed-address control transfer (branch or subroutine call) in the assembly code to be rendered as an HTML link that will scroll or jump the browser to the target of the branch. You can ignore code that transfers indirectly to a location contained in a register. You can also ignore transfers whose targets are outside the code you are presenting (i.e., in a library package). When viewing your web page, a user should be able to jump to the destination of a subroutine call (or the continuation of a loop) by clicking the link in the neighboring assembly code. After following a link, one can return to the original location by using the browsers back button. Be aware that C allows static functions in different source files to have the same name. Be sure you identify and differentiate among these, so your links always go to the right place.
You may write your program in Perl, Python, Ruby, or (with instructors permission) some other scripting language. If youre undecided, I recommend either Perl (because of its ubiquity in systems administration) or Ruby (because of its elegant merger of imperative and functional programming). Python has the disadvantage of substantially less succinct notation for pattern matching, which youll be doing a lot of in this assignment. You will almost certainly want to work on the csug machines: behavioral details of llvm-dwarfdump and objdump vary across versions and platforms, so you code is unlikely to port easily from elsewhere.
Your xref program should be run in a directory containing an executable program (built with -g3) and a collection of C source files from which the program was built (all .c and .h files other than standard library header files). When invoked with the name of the executable (e.g., myprogram) as a command-line argument, xref should
1.run objdumpd myprogram and examine the output to obtain the assembly language version of the program.
2.run llvm-dwarfdump debug-line myprogram and examine the output to learn the names of the source files and the code ranges in the program corresponding to each line in those files.
3.convert the source code to HTML, with side-by-side assembly and source, and with embedded branch-target links, as described above.
4.place the HTML file(s) into a subdirectory named HTML, with an extra file index.html that contains a link to the main HTML file(s), a location-specific link to the beginning of the code for main, and information about when and where the xref tool was run.
For a hint as to what your side-by-side code might look like, you can try running objdump-Sd to see interleaved source and assembly (this feature may be a little buggy; no guarantees). You might also try the disassemble /m command in gdb. Note, however, that you are required to build your pages without using these extra mechanisms; instead, you have to glean the correspondence from the debug-line table in llvm-dwarfdumps output.
If you have taken CSC 252, you may know that the correspondence between source and assembly code is not always very intuitive, especially at higher levels of optimization. You may want to start with programs that have been compiled with O0. We reserve the right to test (and grade!) your code on arbitrary programs, however, including those that have been compiled with O3.
Please do not attempt to do anything fancy in your HTML files: we wont be giving credit for this, and it makes grading a lot harder. Stick to the straightforward markup of really old text-only pages.
Division of labor and writeup
As in previous assignments, you may work alone or in teams of two. If you choose to work in pairs, one possible division of labor is for one partner to write the code that inspects the llvm-dwarfdump and objdump output and the other to use this information to create the HTML files. If you do this, be very careful to agree on the information you need, and read each others code to look for errors.
Be sure to follow all the rules on the Grading page. As with all assignments, use the turn-in script: ~cs254/bin/TURN_IN. Put your write-up in a README.txt or README.pdf file in the directory in which you run the script. Be sure to include your name(s)both of them, if youre working as a team. Also be sure to describe any features of your code that the TAs might not immediately notice. To illustrate the functionality of your code, you may want to include test data, contained in subdirectories.
Resources
Documentation on DWARF can be found in various places the web, notably www.dwarfstd.org. Theres a pretty good tutorial introduction at this site. (The tutorial includes some history and some encoding information that you wont actually need, but even those parts are interesting.) The official DWARF4 Standard is also available, but its over 300 pages long, and almost certainly more than you need.
You should probably look over the llvm-dwarfdump man page (also availabe from the command line with the man command) to see what information is available, especially if you want to try some of the extra credit options. You are also welcome to look over the objdump man page, but youre only allowed to use its -d output.
The standard reference for Perl is the Camel book, Programming Perl, by Christiansen, d foy, Wall, and Orwant. Theres a copy on reserve at Carlson Library. Its also available on-line from UR IP addresses. The perldoc pages are probably the best quick-reference guide (though not necessarily the best way to learn the language). They are available at perldoc.perl.org, and on the CSUG machines as a collection of man pages; type man perl. More extensive on-line resources for Perl can be found at perl.org. Perlmonks is also very good.
For Ruby, visit ruby-lang.org. The site includes several tutorials. The standard (very readable) reference is Programming Ruby 1.9 & 2.0: The Pragmatic Programmers Guide by Thomas, Fowler, and Hunt. The first edition of this reference is available free on-line.
Extra Credit suggestions
1.Implement (under the control of command-line switches) a version of your tool that organizes the HTML output by source file or function, rather than by assembly-language address.
2.Extend your code to create additional cross-references. Start with function names in the source code, then consider global variables, local variables, and formal parameters. For an extra challenge, try typedef names, struct and union tags, structure and union field names, enum constants, labels, and/or macros.
3.Provide a search facility in your web pages, to quickly get to an identifier.
4.Syntax-color the code in your pages, so that comments, declarations, keywords, constants, etc., are visibly distinguished.
5.Provide links to names declared in standard header files.
6.Extend your work to support additional programming languages.
Trivia Assignment
Before the end of the day on Wednesday, October 23, each student should complete the T4 trivia assignment found on Blackboard.
MAIN DUE DATE:
Friday November 8, 11:59pm; no extensions.
Last Change: 21 October 2019 /
Reviews
There are no reviews yet.