[SOLVED] 代写 algorithm High Performance Computer

30 $

File Name: 代写_algorithm_High_Performance_Computer.zip
File Size: 395.64 KB

SKU: 9079275389 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


High Performance Computer
Architecture Revision Exercise 4
Time allowed: 1 hour i.e. for your own revision and practice Full score : 50 marks
o This review exercise will be useful for you to revise the concepts covered in this course, esp. before examination. It is solely for your OWN REVISION AND THUS NO NEED TO SUBMIT the answers to me.
o Try to attempt each question and then review the suggested answers afterwards.
o The whole exercise consists of 4 pages. Check it out yourself.
o Time your handling of this exercise. You are advised to leave the last 5 minute to check your answer.
Page 1 of 4

Question 1. 25 marks
Consider the following pipeline:
Integer operation
LW instruction
Thus, the executionstage latencies of the instructions are clearly stated as follows:
MULS floatingpoint multiply: 4 cycles
ADDS floatingpoint addition: 2 cycles
integer operations: 1 cycle
LW memory load: 3 cycles the first cycle is for address calculation while the remaining
two cycles are for memory access
one branch delay slot
The following program fragment computes a dotproductccai bi. Registers r2 and
r3 contain addresses of arrays of floatingpoint numbers. Register r1 contains the length of the arrays number of elements. Register f8 the dotproduct result, i.e., c is initialized to zero.
ADDS instruction
dotprod:
lw f5, 0r2
lw f6, 0r3
muls f7, f5, f6
adds f8, f8, f7
addi r2, r2, 4
addi r3, r3, 4
addi r1, r1, 1
bne r1, zero, dotprod
nop
load element from 1st array
load element from 2nd array
multiply elements
accumulate results
advance array index
advance array index
decrement element count
; loop if not done
; do nothing but always
; execute branch delay
; slot
MULS instruction
Branch condition determined here
; ; ; ; ; ; ;
a 7marksClearlyexplainthetotalnumberofcyclesperiterationtoexecutetheaboveprogramon the concerned pipeline when forwarding is allowed for any branch instruction.
Page 2 of 4

b 8 marks Without unrolling the loop, rearrange the code so that the number of cycles per iteration is minimized. Show your rearranged program code. Besides, clearly explain the total number of cycles per iteration to execute your rearranged program on the concerned pipeline when forwarding is not allowed for any branch instruction.
c 10 marks Unroll the loop once, and schedule it to completely avoid stalls. Show your revised program code. Besides, clearly explain the total number of cycles per iteration to execute your revised program on the concerned pipeline when forwarding is not allowed for any branch instruction.
Page 3 of 4

Question 2. 25 marks
a 3marksBesidesthereadafterwriteRAWhazard,thereare2frequentlyoccurringdatahazards in high performance computer systems due to data dependencies in a program. Name and clearly define these 2 types of data hazards with simple code examples for illustration.
b 8 marks Based on the Tomasulo algorithm for program execution, discuss how each of the 2 data hazards as defined in a can be resolved, and give a short code example to demonstrate the idea.
c 8 marks The following diagram is aimed to show all the relevant statuses of the Tomasulo algorithm at the end of cycle 6 for the following program fragment. However, the diagram is incomplete with some missing details.
Cycle 6:
Instruction Status : Instruction j
LD F6 30 LD F2 41 MULTD F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8
Reservation Stations : Time Name Busy
1 Add1 Add2 Add3
9 Mult1 Mult2
Register result status :
Clock
6 FU
k
R1 R2 F4 F2 F6 F2
Issue
S1 Vj
F2
Exec. Write Comp Result
S2 RS Vk Qj
F4 F6
Busy
Address
Op
F0
RS Qk
F8
F10 F12
…. F30
Yes SUBD MA1 MA2
No
Yes MULTD MA2 RF4
134 24
3
4
5
Mult1 MA2 Add2 Add1 Mult2
Copy the above diagram onto your answer book and complete all the missing details. Besides, clearly explain the major difference between the Scoreboard approach and the Tomasulo algorithm in program execution after the instruction DIVD is issued.
d 6 marks Use a diagram to show all the relevant statuses of the Tomasulo algorithm at the end of cycle 7 for the above program fragment. Besides, clearly state the specific instructions that isare waiting for the result produced by the reservation station Add1 in that cycle.
END OF REVISION EX. 4
Load1 Load2 Load3
No No No
Page 4 of 4

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 algorithm High Performance Computer
30 $