In this homework, you are going to extend your Homework 4 to a pipelined CPU. This CPU has 32 registers and 1KB instruction memory. It should take 32-bit binary codes as input and should do the corresponding RISC-V instructions, saving the result of arithmetic operations into the corresponding registers. Besides the instruction specified in Homework 4, you have to support lw, sw, beq in this Project additionally. We will examine the correctness of your implementation by dumping the value of each register and data memory after each cycle.
1.1. Load / Store OperationsIn this project, you are provided a “Data_Memory” module and are required to implement lw and sw instructions. Figure 1 shows the data path after adding data memory to your Homework 4. The dashed lines are placeholders for modules in the following sections. You can just see them as real lines.
Figure 1 Single-cycle CPU with Data Memory1.2. Pipeline RegistersTo support pipeline execution, the first step is adding pipeline registers to the CPU. Pipeline registers store control signals and data from the last step and isolate each step. Note that you have to use the non-blocking assignment in your pipeline register to get the correct result. And you have to properly initialize reg values in your pipeline registers.Figure 2 Datapath after adding pipeline registers (colored in green)1.3. Data Hazard and Forwarding ControlThe first issue after adding pipeline execution is to handle data hazards. For example, if we execution “add t3, t1, t2” and “add t4, t1, t3” sequentially, t3 will not be written back until the 5th cycle. But the second instruction is brought into the execution stage at the 4th cycle, causing a different result from single-cycle implementation. To properly handle such data hazards, we can forward the ALU result of the first instruction to the execution stage of the second instruction.As a result, we shall have a “forwarding unit” exploring whether we should forward data from MEM stage or WB stage to EX stage. If the forwarding unit finds that the data in the later stage will be written back to the register that is taken in the EX stage, the data should be forwarded from the later stage to ensure correct arithmetic operation. To be more specific, Table 1, Listing 1, and Listing 2 are the recommended method to resolve data hazards from the textbook.Note that the forwarding unit is placed in EX stage, implying that we only forward to EX stage. You don’t have to handle the cases that forwarding to ID stage is necessary to keep the correctness. For example,add x5, x6, x7 beq x5, x4, BRANCHinstruction sequences like this will not be involved in our evaluation.MUX value Source ExplanationForwardA 00 ID/EX The first ALU operand comes from the register files10 EX/MEM The first ALU operand is forwarded from the prior ALU result.01 MEM/WB The first ALU operand is forwarded from data memory or an earlier ALU result.ForwardB 00 ID/EX The second ALU operand comes from the register files10 EX/MEM The second ALU operand is forwarded from the prior ALU result.01 MEM/WB The second ALU operand is forwarded from data memory or an earlier ALU result.Table 1 Forwarding Controlif (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0)and (EX/MEM.RegisterRd == ID/EX.RegisterRs1)) ForwardA = 10
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 10Listing 1 EX hazardif (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0)and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0)and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs2)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01Listing 2 MEM hazard
Figure 3 Data path after adding Forwarding Control (colored in blue)1.4. Hazard Detection, Stall, and FlushBesides data hazards caused by arithmetic operations, which can be resolved by forwarding, data hazards caused by “load” cannot be resolved simply by forwarding and requires stall. Another major difference between Homework 4 and this project is that we have to support branch instruction, which causes control hazards for the wrong prediction.
Figure 4 An example to stall from textbookAs a result, we have to implement a “hazard detection unit” to detect whether to stall the pipeline or to flush when a control hazard happens. The hazard detection unit detects whether the rd in EX stage is the same as rs1 or rs2 in ID stage. If so, adding a nop (no operation) to the pipeline to resolve data hazard.
Figure 5 Final Datapath after Adding Hazard Detection Unit (colored in orange)Note that to mitigate the impact of the branch instruction, we implement the branch decision at ID stage without using ALU, which is a recommended implementation by the textbook. You don’t have to handle the forwarding to ID stage. We will avoid the test cases that require forwarding to ID stage (example at the end of 1.3).1.5. InstructionsBesides instructions specified in Homework 4, you have to support additional 3 instructions, lw, sw, beq. Their machine code is as follows.funct7 rs2 rs1 funct3 rd opcode function0000000 rs2 rs1 111 rd 0110011 and0000000 rs2 rs1 100 rd 0110011 xor0000000 rs2 rs1 001 rd 0110011 sll0000000 rs2 rs1 000 rd 0110011 add0100000 rs2 rs1 000 rd 0110011 sub0000001 rs2 rs1 000 rd 0110011 mulimm[11:0] rs1 000 rd 0010011 addi0100000 imm[4:0] rs1 101 rd 0010011 sraiimm[11:0] rs1 010 rd 0000011 lwimm[11:5] rs2 rs1 010 imm[4:0] 0100011 swimm[12,10:5] rs2 rs1 000 imm[4:1,11] 1100011 beq
1.6. Input / Output FormatBesides the modules listed above, you are also provided “testbench.v” and “instruction.txt”. After you finish your modules and CPU, you should compile all of them including “testbench.v”. A recommended compilation command would be$ iverilog *.v –o CPU.outThen by default, your CPU loads “instruction.txt”, which should be placed in the same directory as CPU.out, into the instruction memory. This part is written in“testbench.v”. You don’t have to change it. “instruction.txt” is a plain text file that consists of 32 bits (ASCII 0 or 1) per line, representing one instruction per line. For example, the first 3 lines in “instruction.txt” are
0000000_00000_00000_000_01000_0110011 //add $t0,$0,$0000000001010_00000_000_01001_0010011 //addi $t1,$0,10000000001101_00000_000_01010_0010011 //addi $t2,$0,13
Note that underlines and texts after “//” (i.e. comments) are neglected. They are inserted simply for human readability. Therefore, the CPU should take“00000000000000000000010000110011” and execute it in the first cycle, then “00000000101000000000010010010011” in the second cycle, and“00000000110100000000010100010011” in the third, and so on.
Also, if you include unchanged “testbench.v” into the compilation, the program will generate a plain text file named “output.txt”, which dumps values of all registers and data memory at each cycle after execution. The file is self-explainable.
1.7. Modules You Need to Add or Modify1.7.1. Control and ALU_ControlBecause your CPU has to support load/store instructions in this project, which are not involved in Homework 4, you have to add some additional control signals in the Control and ALU_Control module.1.7.2. Pipeline RegistersAs is introduced in section 1.2, you have to implement 4 pipeline registers to isolate 5 pipeline stages, and passing essential information to the next stage.1.7.3. Forwarding UnitAs is introduced in section 1.3, you need a forwarding unit and two multiplexers to control forwarding from MEM stage or WB stage.1.7.4. Hazard Detection UnitAs is introduced in section 1.4, you need a hazard detection unit to handle the necessary stall and nop (no operation).1.7.5. testbenchYou have to initialize reg in your pipeline registers before any instruction is executed. It is recommended that you initialize them in the “initial” block of the testbench.v. Except for registers initialization, please do not change the output format ($fdisplay part) of this file.1.7.6. OthersYou can add more modules than listed above if you want. Figure 5 is simply a recommended data path for you to refer to. You are free to change some details as long as your CPU can perform correctly.1.7.7. CPUUse structure modeling to connect the input and output of modules following the data path in Figure 5.1.8. ReminderProject 2 will be strongly related to this homework. Please make sure you can fully understand how to write this homework; otherwise, you may encounter difficulties in your next project. Plagiarism is strongly prohibited.
2.1. Modules ExplanationYou should briefly explain how the modules you implement work in the report. You have to explain them in human-readable sentences. Either English or Chinese is welcome, but no Verilog. Explaining Verilog modules in Verilog is nonsense. Simply pasting your codes into the report with no or little explanation will get zero points for the report. You have to write more detail than Section 1.7. Take “PC.v” as an example, an acceptable report would be:PC module reads clock signals, reset bit, start bit, and next cycle PC as input, and outputs the PC of the current cycle. This module changes its internal register “pc_o” at the positive edge of the clock signal. When the reset signal is set, PC is reset to 0. And PC will only be updated by next PC when the start bit is on.
And following report will get zero points.The inputs of PC are clk_i, rst_i, start_i, pc_i, and ouput pc_o. It works as follows:
[email protected](posedge clk_i or negedge rst_i) begin if(rst_i) begin pc_o <= 32’b0; end else begin if(start_i) pc_o <= pc_i; elsepc_o <= pc_o; end end
2.2. Members & TeamworkSpecify your team members and your work division. For example, who writes the pipeline registers, who is in charge of connecting wires in the CPU, etc.2.3. Difficulties Encountered and Solutions in This ProjectWrite down the difficulties if any you encountered in doing this project, and the final solution to them.2.4. Development EnvironmentPlease specify the OS (e.g. MacOS, Windows, Ubuntu 18.04) and compiler (e.g. iverilog) or IDE (e.g. ModelSim) you use in the report, in case that we cannot reproduce the same result as the one in your computer.