1.0 < 4.3> For the problems in this exercise, assume that there are no pipeline stalls and that the breakdown of executed instructions is as follows:
add | addi | slt | beq | lw | sw | jump | and |
20 % | 10 % | 3 % | 20 % | 25 % | 12 % | 5 % | 5 % |
In what fraction of all cycles is the input of the sign-extend circuit needed? What is the circuit doing in cycles in which the input is not needed?
2.0 <4.4> In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following instruction word:
0000 0000 0000 1100 0001 1011 0000 0000
Assume that data memory is all zeros and that the processors registers have the following values at the beginning of the cycle in which the above instruction word is fetched.
r0 | r1 | r2 | r3 | r4 | r5 | r6 | r8 | r12 | r31 |
0 | -1 | 2 | -3 | -4 | 10 | 6 | 8 | 2 | -16 |
2.0.1 What are the outputs of the sign-extend and the jump Shift left 2 unit (near the top of Figure
4.24) for this instruction word?
2.0.2 For the ALU and the two add units, what are their data input values?
3.0 < 4.5> In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:
IF | ID | EX | MEM | WB |
280 ps | 250 ps | 320 ps | 330 ps | 250 ps |
Also, assume that instructions executed by the processor are broken down as follows:
alu | beq | lw | sw |
35% | 15% | 25% | 25% |
3.0.1 What is the clock cycle time in a pipelined and non-pipelined processor?
3.0.2 Assuming there are no stalls or hazards, what is the utilization of the data memory?
3.0.3 Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another instruction is fetched. In this organization, an instruction only goes through stages it actually needs (e.g. ST
CPE 431/531 Homework #4 Fall 2020
only takes 4 cycles because it does not need the WB stage). Compare clock cycle times and execution times with single cycle, multi-cycle, and pipelined organization.
4.0 <4.5> In this exercise, we examine how data dependences affect execution in the basic 5-stage pipeline described in Section 4.5. Problems in this exercise refer to the following sequence of instructions:
- or $t0, $s2, $t0
- and $t0, $t0, r4 (3) sw $t0, 24($s6)
- lw $t5, 12($s6)
- sub $t0, $t5, $t1
Also, assuming the following cycle times for each of the options related to forwarding:
Without Forwarding | With Full Forwarding | With ALU-ALU Forwarding Only |
250 ps | 300 ps | 290 ps |
Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them.
5.0 Consider the following loop.
Loop: lw $t0, 0($t1) and $s6, $s2, $t1 lw $t9, 0($t0) lw $t8, 0($s6)
beq $t8, $t9, loop
Assume that perfect branch prediction is used (no stalls due to control hazards), that there are no delay slots, and that the pipeline has full forwarding support. Also, assume that many iterations of this loop are executed before the loop exits.
Show a pipeline execution diagram for the third iteration of this loop, from the cycle in which we fetch the first instruction of that iteration up to (but not including) the cycle in whch we can fetch the first instruction of the next iteration. Show all instructions that are in thepipeline during the cycles (not just the third iteration).
Reviews
There are no reviews yet.