, , , , , , ,

[SOLVED] Ece463/563  project #3: dynamic instruction scheduling (version 1.0)

$25

File Name: Ece463_563____project__3__dynamic_instruction_scheduling__version_1_0_.zip
File Size: 659.4 KB

5/5 - (1 vote)

1.  Preliminary InformationUnlike previous projects, the scope of this project is the same for both ECE 463 and ECE 563 students. This is because the OOO pipeline must be modeled in its entirety to measure cycles to execute a trace. Note that the project focuses on modeling data dependencies (only through registers), pipeline stages, and structural hazards (Issue Queue and Reorder Buffer). Therefore, we assume perfect branch prediction and perfect caches, and ignore memory dependencies: you will NOT integrate a BTB, conditional branch predictor, instruction cache/TLB, data cache/TLB, or Load Queue/Store Queue.You must implement your project using the C, C++, or Java languages, for two reasons. First, these languages are preferred for computer architecture performance modeling. Second, our Gradescope autograder only supports compilation of these languages.1.4.  Responsibility for self-grading your project via Gradescope You will submit, validate, and SELF-GRADE your project via Gradescope; the TAs will only manually grade the report. While you are developing your simulator, you are required to frequently check via Gradescope that your code compiles, runs, and gives expected outputs with respect to your current progress. This is necessary to resolve porting issues in a timely fashion (i.e., well before the deadline), caused by different compiler versions in your programming environment and the Gradescope backend. This is also necessary to resolve non-compliance issues (i.e., how you specify the simulator’s command-line arguments, how you format the simulator’s outputs, etc.) in a timely fashion (i.e., well before the deadline).In this project, you will construct a simulator for an out-of-order superscalar processor that fetches and issues N instructions per cycle. Only the dynamic scheduling mechanism will be modeled in detail, i.e., perfect caches and perfect branch prediction are assumed.The simulator reads a trace file in the following format: <PC> <operation type> <dest reg #> <src1 reg #> <src2 reg #><PC> <operation type> <dest reg #> <src1 reg #> <src2 reg #>  … Where: For example: ab120024  0   1  2  3 ab120028  1   4  1  3 ab12002c  2  -1  4  7 Means: “operation type 0”  R1, R2, R3“operation type 1”  R4, R1, R3“operation type 2”  -, R4, R7                    // no destination register! Traces are posted on the Moodle website. The simulator executable built by your Makefile must be named “sim” (the Makefile is discussed in Section 6). Your simulator must accept command-line arguments as follows: sim <ROB_SIZE> <IQ_SIZE> <WIDTH> <tracefile> The parameters <ROB_SIZE>, <IQ_SIZE>, and <WIDTH>, are explained in Section 5.<tracefile> is the filename of the input trace.The simulator first outputs the timing information for each dynamic instruction in program order, followed by final outputs (simulator command, processor configuration, and simulation results). See Section 6 regarding the formatting of these outputs and validating your simulator. The simulator outputs the timing information for each dynamic instruction in the trace, in program order (i.e., in the same order that instructions appear in the trace). The per-instruction timing information is output in the following format: <seq_no> fu{<op_type>} src{<src1>,<src2>} dst{<dst>}FE{<begin-cycle>,<duration>} DE{} RN{} RR{} DI{} IS{} EX{} WB{} RT{}  <seq_no> is the line number in trace (i.e., the dynamic instruction count), starting at 0. Substitute 0, 1, or 2 for the <op_type>. <src1>, <src2>, and <dst> are register numbers(include –1 if that is the case). For each of the pipeline stages, indicate the first cycle that the instruction was in that pipeline stage followed by the number of cycles the instruction was in that pipeline stage. Here is an example instruction from one of the validation runs. 5 fu{2} src{15,-1} dst{16} FE{5,1} DE{6,1} RN{7,1} RR{8,1} DI{9,1} IS{10,3} EX{13,5} WB{18,1} RT{19,1} Notice that the begin-cycle of a given pipeline stage equals the begin-cycle of the immediately preceding pipeline stage plus the number of cycles spent in the immediately preceding pipeline stage. For example, the instruction’s first cycle in the EX stage is cycle 13, which is the first cycle in IS (10) plus the number of cycles spent in IS (3).The simulator outputs the following after completion of the run:   Figure 1.  Overview of microarchitecture to be modeled, including the terminology and parameters used throughout this specification.  Parameters:  Function units: There are WIDTH universal pipelined function units (FUs). Each FU can execute any type of instruction (hence the term “universal”). The operation type of an instruction indicates its execution latency: Type 0 has a latency of 1 cycle, Type 1 has a latency of 2 cycles, and Type 2 has a latency of 5 cycles. Each FU is fully pipelined. Therefore, a new instruction can begin execution on a FU every cycle. Pipeline registers: The pipeline stages shown in Figure 1 are separated by pipeline registers. In general, this spec names a pipeline register based on the stage that it feeds into. For example, the pipeline register between Fetch and Decode is called DE because it feeds into Decode. A “bundle” is the set of instructions in a pipeline register. For example, if DE is not empty, it contains a “decode bundle”.Table 1 lists the names of the pipeline registers used in this spec. It also provides a description of each pipeline register and its size (max # instructions).Table 1.  Names, descriptions, and sizes of all of the  pipeline registers.  About register values: For the purpose of determining the number of cycles it takes for the microarchitecture to run a program, the simulator does not need to use and produce actual register values. This is why the initial Architectural Register File (ARF) values are not provided and the instruction opcodes are omitted from the trace. All that the simulator needs, to determine the number of cycles, is the microarchitecture configuration, execution latencies of instructions (operation type), and register specifiers of instructions (true, anti-, and output dependencies). This section provides a guide to implementing your simulator.Call each pipeline stage in reverse order in your main simulator loop, as follows. The comments indicate tasks to be performed. // To issue an instruction: // 1) Remove the instruction from the IQ.// 2) Add the instruction to the//    execute_list. Set a timer for the//    instruction in the execute_list that//    will allow you to model its execution//    latency. // enough free entries to accept the entire// rename bundle, then process (see below)// the rename bundle and advance it from// RN to RR.//// Apply your learning from the class // lectures/notes on the steps for renaming:// (1) allocate an entry in the ROB for the// instruction, (2) rename its source// registers, and (3) rename its destination// register (if it has one). Note that the// rename bundle must be renamed in program// order (fortunately the instructions in // the rename bundle are in program order).   } while (Advance_Cycle()); // Advance_Cycle performs several functions.  First, it advances the simulator cycle. Second, when it becomes known that the  pipeline is empty AND the trace is depleted, the function returns “false” to terminate the loop.Sample simulation outputs are provided on the Moodle site. These are called “validation runs”. Refer to the validation runs to see how to format the outputs of your simulator. You must submit, validate, and self-grade[2] your project using Gradescope. Here is how Gradescope (1) receives your project (zip file), (2) compiles your simulator (Makefile), and (3) runs and checks your simulator (arguments, print-to-console requirement, and “diff -iw”):    runs have whitespace. Note, however, that extra or missing blank lines are NOT ok: “diff -iw” does not ignore extra or missing blank lines.   See the required report template in Moodle for the grading breakdown, experiments, and report contents.  Use the report template as the basis for the report that you submit (insert graphs, fill in answers to questions, etc.). Various deductions (out of 100 points): -1 point for each day (24-hour period) late, according to the Gradescope timestamp. The late penalty is pro-rated on an hourly basis: -1/24 point for each hour late. We will use the “ceiling” function of the lateness time to get to the next higher hour, e.g., ceiling(10 min. late) = 1 hour late, ceiling(1 hr, 10 min. late) = 2 hours late, and so forth. For this third and final project, Gradescope will accept late submissions no more than one week after the deadline. The goal of this policy is to allow adequate time for the TAs to grade reports and assess partial credit for simulator development effort (for simulators that don’t match any validation runs), before final grades are due for the semester. See Section 1.1 for penalties and sanctions for academic integrity violations.  It is good practice to frequently make backups of all your project files, including source code, your report, etc.  You can backup files to another hard drive (your NFS B: drive in your NCSU account, home PC, laptop … keep consistent copies in multiple places) or removable media(flash drive, etc.).Correctness of your simulator is of paramount importance. That said, making your simulator efficient is also important because you will be running many experiments: many superscalar processor configurations and multiple traces. Therefore, you will benefit from implementing a simulator that is reasonably fast. One simple thing you can do to make your simulator run faster is to compile it with a high optimization level. The example Makefile posted on the Moodle site includes the –O3 optimization flag. Note that, when you are debugging your simulator in a debugger (such as gdb), it is recommended that you compile without –O3 and with –g. Optimization includes register allocation. Often, register-allocated variables are not displayed properly in debuggers, which is why you want to disable optimization when using a debugger. The –g flag tells the compiler to include symbols (variable names, etc.) in the compiled binary. The debugger needs this information to recognize variable names, function names, line numbers in the source code, etc. When you are done debugging, recompile with –O3 and without –g, to get the most efficient simulator again. As mentioned in Section 6, another reason for being wary of excessive run times is Gradescope’s autograder timeout. I have written a tool that allows you to display instruction schedules identical to the ones drawn in class. You may use this tool as an optional visualization aid. o Download the scope tool from the Moodle website. o Run your simulator and redirect its output to some filename, <scope-input-file>. Go to the Moodle website to view an example.  [1] The ISA is MIPS-like: 32 integer registers, 32 floating-point registers, the HI and LO registers (for results of integer multiplication/divide), and the FCC register (floating-point condition code register).[2] The mystery runs component of your grade will not be published until we release it.  The report will be manually graded by the TAs.

Shopping Cart

No products in the cart.

No products in the cart.

[SOLVED] Ece463/563   project #3: dynamic instruction scheduling (version 1.0)[SOLVED] Ece463/563  project #3: dynamic instruction scheduling (version 1.0)
$25