[SOLVED] MIPS Consider the following code for a MIPS like pipelined processor with 5-stage pipeline, which computes Yi = (a.Xi + b).Xi + c for i=0..N-1 where N is the number of elements in a vector. For better readability, in the following I have named the registers as Rx, Ry, Ra, Rb, Rc, etc. Assume that the register R3 points to an array that contains 4*N, a, b, and c.

$25

File Name: MIPS_ Consider_the_following_code_for_a_MIPS_like_pipelined_processor_with_5-stage_pipeline,_which_computes_Yi_=_(a.Xi_+_b).Xi_+_c_for_i=0..N-1_where_N_is_the_number_of_elements_in_a_vector._For_better_readability,_in_the_following_I_have_named_the_registers_as_Rx,_Ry,_Ra,_Rb,_Rc,_etc.__Assume_that_the_register_R3_points_to_an_array_that_contains_4*N,_a,_b,_and_c..zip
File Size: 3447.72 KB

5/5 - (1 vote)

Consider the following code for a MIPS like pipelined processor with 5-stage pipeline, which computes Yi = (a.Xi + b).Xi + c for i=0..N-1 where N is the number of elements in a vector. For better readability, in the following I have named the registers as Rx, Ry, Ra, Rb, Rc, etc.Assume that the register R3 points to an array that contains 4*N, a, b, and c.

LoadR1, 0(R3)//Load last address of array X
Loop:SubR1, R1, #4//In the first iteration, this will ensure that we start with last element of the array
LoadRx, 400(R1)//Load Xi (Note: array X starts at address 400)
LoadRa, 4(R3)// Load a
MulRy, Rx, Ra//Multiply by a
LoadRb, 8(R3)// Load b
AddRy, Ry, Rb//add b
MulRy, Ry, Rx//Multiply by Xi
LoadRb, 12(R3)// Load c
AddRy, Ry, Rc//Add c
StoreRy, 800(R1)// Store Yi (Note: Array Y starts at address 800)
BNZR1, Loop
Assume that all ALU operations take one cycle except for multiply that takes 4 cycles.Assume that the result forwarding is used so that the only additional stalls are as follows: (a) need an additional cycle to use the result of a load, and (b) without any prediction, the branch must stall for 2 cycles before the next instruction can be executed. Also assume 2 adder units and 2 multiply units.
Restructure the code to do the following: (a) remove any unnecessary operations to outside the loop, and (b) minimize stalls by moving around instructions and suitably changing memory addresses. Explain your restructuring.
Following the restructuring, show the timings for straight pipelined execution of the program. That is, if any pipeline stage needs n>1 cycles, the following instructions are delayed by additional n-1 cycles. You can use diagrams like in Figures C.31/C.32 of HP-CO. Indicate where the stalls happen and the number of cycles of stall. Compute the number of cycles it will take to go through the loop once and issue the BNZ instruction.
What additional stalls can you reduce if you were to unroll the loop once? Show the result of unrolling and code restructuring.
Assuming N=10, how many total cycles does this loop take in (c) if the branch is always predicted to be taken, and it takes 5 additional cycles to cleanup if the prediction is wrong.

In above consider execution using Tomasulo. Keep the restrictions above.
Indicate the cycle in which the instruction is issued (i.e., ready to execute), actually starts execution, and when it completes. Indicate how many cycles it takes to issue the BNZ instruction first time around the loop. You do not need to draw the full diagram that you saw in lecture notes, but only indicate the dependencies and clock cycle number to justify your answer.
Repeat (a) under the assumption that we have only 1 multiply unit.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] MIPS Consider the following code for a MIPS like pipelined processor with 5-stage pipeline, which computes Yi = (a.Xi + b).Xi + c for i=0..N-1 where N is the number of elements in a vector. For better readability, in the following I have named the registers as Rx, Ry, Ra, Rb, Rc, etc. Assume that the register R3 points to an array that contains 4*N, a, b, and c.
$25