[SOLVED] 代写 parallel assembly computer architecture graph software Computer Architecture

30 $

File Name: 代写_parallel_assembly_computer_architecture_graph_software_Computer_Architecture.zip
File Size: 781.86 KB

SKU: 3740574603 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Computer Architecture
Course code: 0521292B 07. Dependence Handling
Jianhua Li
College of Computer and Information Hefei University of Technology
slides are adapted from CA course of wisc, princeton, mit, berkeley, etc.
The uses of the slides of this course are for educa/onal purposes only and should be
used only in conjunc/on with the textbook. Deriva/ves of the slides must
acknowledge the copyright no/ces of this and the originals.
1

Data Dependence Types
Flow dependence r3  r1 op r2 r5  r3 op r4
Anti dependence r3  r1 op r2 r1  r4 op r5
Output-dependence r3 r1 opr2
r5 r3 opr4
r3 r6 opr7
Read-after-Write (RAW)
Write-after-Read (WAR)
Write-after-Write (WAW)
2

记住如何处理数据相关
• Anti and output dependences are easier to handle
– write to the destination in one stage and in program order
• Flow dependences are more interesting
• Four typical ways of handling flow dependences
– Detect and wait until value is available in register file
– Detect and forward/bypass data to dependent instruction
– Detect and eliminate the dependence at the software level
• No need for the hardware to detect dependence
– Predict the needed value(s), execute “speculatively”, and verify 3

RAW相关处理
• Which one of the following flow dependences
lead to conflicts in the 5-stage pipeline?
addi ra r- –
addi r-ra- addi r-ra- addi r-ra- addi r-ra- addi r-ra-
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
IF
ID
EX
IF
?
ID
IF
4

寄存器数据相关分析
R/I-Type
LW
SW
Br
J
Jr
IF
ID
read RF
read RF
read RF
read RF
read RF
EX
MEM
WB
write RF
write RF
• For a given pipeline, when is there a potential conflict between two data dependent instructions?
– dependence type: RAW, WAR, WAW?
– instruction types involved?
– distance between the two instructions?
5

Safe and Unsafe Movement of Pipeline
j:_rk iFj
j:rk_ iAj
j:rk_ iOj
Reg Read
stage X
Reg Write
Reg Write
i:rk_
RAW Dependence
i:_rk
WAR Dependence
i:rk_
WAW Dependence
Reg Write
stage Y
Reg Read
Reg Write
dist(i,j)  dist(X,Y)  ?U?nsafe to keep j moving dist(i,j) > dist(X,Y)  S?a?fe
6

RAW 相关分析示例
R/I-Type
LW
SW
Br
J
Jr
IF
ID
read RF
read RF
read RF
read RF
read RF
EX
MEM
WB
write RF
write RF
• Instructions IA and IB (where IA comes before IB) have RAW dependence iff
– IB (R/I, LW, SW, Br or JR) reads a register written by IA (R/I or LW) – dist(IA, IB)  dist(ID, WB) = 3
What about WAW and WAR dependence? What about memory data dependence?
7

Pipeline Stall: 解决数据相关
t0
t1
t2
Insth Insti Instj Instk Instl
i
IF
ID
IF
ALU
ID
MEM
ALU
WB
MEM
WB
j
AMWLEBUM IAMWDLEBUM IAMFDLEUM IAFDLU IFD
IF
2. drain all down-stream stages 8
IF
ID
IADLU
IAMDLEUM
IAMWDLEBUM
IF
IFD
IF
IAFDLU
IFD
i: rx  _ jb:u_bblerx jb:u_bbler jb:u_bblerx
dist(i,j)=1
IF
IAMFDLEUM
IAFDLU
IFD
j:_rx x
dist(i,j)=3 wait until its source data value is available dist(i,j)=4 1. stop all up-stream stages
dist(i,j)=2
t3 t4 t5
Stall = make the dependent instruction
IF

如何实现Pipeline Stall PCSrc
0 M u x 1
Add
ID/EX
Control
M
EX/MEM
WB
MEM/WB
EX
M
IF/ID
WB
4
Shift left 2
Add Add result
ALUSrc
Branch
WB
PC
Address
Instruction memory
Read register 1
Read
register 2
Write Registers Read
register data 2
Write data
Instruction 16 32 [15– 0] Sign
Read data 1
6
ALU control
0 M u x 1
RegDst
Read data
memory
Write data
Address
Data
Instruction [20– 16]
extend
MemRead
ALUOp
Instruction [15– 11]
0
M u x
1
Zero
ALU ALU
result 1
M u x
0
• Stall
– disable PC and IR latching; ensure stalled instruction stays in its stage
– Insert “invalid” instructions/nops into the stage following the stalled one (called “bubbles”)
9
Instruction
MemtoReg
MemWrite
RegWrite

Stall Conditions
• Instructions IA and IB (where IA comes before IB) have RAW dependence iff
– IB (R/I, LW, SW, Br or JR) reads a register written by IA (R/I or LW) – dist(IA, IB)  dist(ID, WB) = 3
• Must stall the ID stage when IB in ID stage wants to read a register to be written by IA in EX, MEM or WB stage
10

Stall Condition Evaluation Logic
• Helper functions
– rs(I) returns the rs field of I
– use_rs(I) returns true if I requires RF[rs] and rs!=r0
• Stallwhen:(满足下列条件之一)
– (rs(IRID)==destEX) && use_rs(IRID) && RegWriteEX
– (rs(IRID)==destMEM) && use_rs(IRID) && RegWriteMEM – (rs(IRID)==destWB) && use_rs(IRID) && RegWriteWB
– (rt(IRID)==destEX) && use_rt(IRID) && RegWriteEX
– (rt(IRID)==destMEM) && use_rt(IRID) && RegWriteMEM – (rt(IRID)==destWB) && use_rt(IRID) && RegWriteWB
• It is crucial that the EX, MEM and WB stages continue to advance normally during stall cycles
11

Pipeline Stall对性能的影响
• Each stall cycle corresponds to one lost cycle in
which no instruction can be completed
• For a program with N instructions and S stall cycles, Average CPI=(N+S)/N
• S depends on
– frequency of RAW dependences
– exact distance between the dependent instructions
– distance between dependences
suppose i1 ,i2 and i3 all depend on i0, once i1’s
dependence is resolved, i2 and i3 must be okay too
12

Sample Assembly (P&H)
• for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) { …… }
for2tst:
addi $s1, $s0, -1
slti $t0, $s1, 0
bne $t0, $zero, exit2 sll $t1, $s1, 2
add $t2, $a0, $t1
lw $t3, 0($t2)
lw $t4, 4($t2)
slt $t0, $t4, $t3 beq $t0, $zero, exit2 ………
addi $s1, $s1, -1
j for2tst
3 stalls
3 stalls
3 stalls 3 stalls
3 stalls 3 stalls
How many stalls?
exit2:
13

通过Data Forwarding减少Stall • AlsocalledDataBypassing
• Forward the value to the dependent instruction as soon as it is available
• Dataflow model
– Data value supplied to dependent instruction as soon as it
is available
– Instruction executes when all its operands are available
• Data forwarding brings a pipeline closer to data flow execution principles
14

Data Forwarding (Bypassing)
• 将RF看做状态(state)更加直观
– “add rx ry rz” literally means get values from RF[ry] and RF[rz]
respectively and put result in RF[rx]
• 但是,RF仅仅是通信抽象的一部分

add addi
– “add rx ry rz” means
1. get the results of the last instructions to define the values of RF[ry]
and RF[rz], respectively,
2. until another instruction redefines RF[rx], younger instructions that
refer to RF[rx] should use this instruction’s result
What matters is to maintain the correct “data
flow” between operations, thus rz r- r-
r- rz r-
IF
ID
EX
MEM
WB
IF
ID
IEDX
IMDEM
IWDB
15

利用Forwarding解决RAW相关
• Instructions IA and IB (where IA comes before IB) have
RAW dependence iff
– IB (R/I, LW, SW, Br or JR) reads a register written by IA (R/I or LW) – dist(IA, IB)  dist(ID, WB) = 3
• In other words, if IB in ID stage reads a register written by IA in EX, MEM or WB stage, then the operand required by IB is not yet in RF
• Howforwardingworks?
 retrieve operand from datapath instead of the RF
 retrieve operand from the youngest definition if multiple
definitions are outstanding
16

ID/EX EX/MEM MEM/WB
dist(i,j)=3
M u x
ForwardA
M u x
ForwardB
M u x
ALU
Registers
dist(i,j)=1
Data memory
dist(i,j)=2
M u x
internal forward?
Rs
Rt
Rt
Rd
EX/MEM.RegisterRd
Forwarding unit
MEM/WB.RegisterRd
dist(i,j)=3
b. With forwarding
a. No forwarding
Data Forwarding Paths (v1)
17

a. No forwarding
dist(i,j)=3
Registers
Data Forwarding Paths (v2)
ID/EX
EX/MEM MEM/WB
M u x
ForwardA
M u x
ForwardB
M u x
ALU
b. With forwarding
[Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Assumes RF forwards internally
Rs
Rt
Rt
Rd
Forwarding unit
dist(i,j)=1
Data memory
EX/MEM.RegisterRd
MEM/WB.RegisterRd
dist(i,j)=2
M u x
18

Data Forwarding Logic (for v2)
if (rsEX!=0) && (rsEX==destMEM) && RegWriteMEM then forward operand from MEM stage // dist=1
else if (rsEX!=0) && (rsEX==destWB) && RegWriteWB then forward operand from WB stage // dist=2
else
use operand from register file // dist >= 3
Ordering matters!! Must check youngest match first Why doesn’t use_rs( ) appear in the forwarding logic?
19

Data Forwarding (相关分析)
R/I-Type
LW
SW
Br
J
Jr
IF
ID
use
EX
use
produce
use
use
use
MEM
produce
(use)
WB
• Even with data-forwarding, RAW dependence on an immediately preceding LW instruction requires a stall
20

Sample Assembly, No Forwarding (P&H)
• for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) { …… }
for2tst:
addi $s1, $s0, -1
slti $t0, $s1, 0
bne $t0, $zero, exit2 sll $t1, $s1, 2
add $t2, $a0, $t1
lw $t3, 0($t2)
lw $t4, 4($t2)
slt $t0, $t4, $t3 beq $t0, $zero, exit2 ………
addi $s1, $s1, -1
j for2tst
3 stalls
3 stalls
3 stalls 3 stalls
3 stalls 3 stalls
exit2:
21

Sample Assembly, Revisited (P&H)
• for (j=i-1; j>=0 && v[j] > v[j+1]; j-=1) { …… }
for2tst:
addi $s1, $s0, -1
slti $t0, $s1, 0
bne $t0, $zero, exit2 sll $t1, $s1, 2
add $t2, $a0, $t1
lw $t3, 0($t2)
lw $t4, 4($t2)
nop
slt $t0, $t4, $t3 beq $t0, $zero, exit2 ………
addi $s1, $s1, -1
j for2tst
exit2:
22

控制相关处理
23

回顾: 控制相关
• Question: What should the fetch PC be in the next cycle?
• Answer: The address of the next instruction
– All instructions are control dependent on previous ones. Why?
• If the fetched instruction is a non-control-flow instruction: – Next Fetch PC is the address of the next-sequential instruction
– Easy to determine if we know the size of the fetched instruction
• If the instruction that is fetched is a control-flow instruction:
– How do we determine the next Fetch PC?
• In fact, how do we even know whether or not the fetched instruction is a control-flow instruction?
24

分支类型
Type
Direction at fetch time
Number of possible next fetch addresses?
When is next fetch address resolved?
Conditional
Unknown
2
Execution (register dependent)
Unconditional
Always taken
1
Decode (PC + offset)
Call
Always taken
1
Decode (PC + offset)
Return
Always taken
Many
Execution (register dependent)
Indirect
Always taken
Many
Execution (register dependent)
Different branch types can be handled differently
25

如何处理控制相关
• Critical to keep the pipeline full with correct sequence of dynamic instructions.
• Potential solutions if the instruction is a control-flow instruction:
– Stall the pipeline until we know the next fetch address – Guess the next fetch address (branch prediction)
– Employ delayed branching (branch delay slot)
– ……
26

Stall Fetch Until Next PC is Available: Good Idea?
t0 t1 t2 t3 t4 t5
Insth Insti Instj
Instk Instl
IF
ID
ALU
MEM
WB
IF
IF
ID
ALU
MEM
WB
IF
IF
ID
ALU
IF
IF
This is the case with non-control-flow and unconditional br instructions!
27

Doing Better than Stalling Fetch …
• Rather than waiting for true-dependence on PC to resolve, just guess nextPC = PC+4 to keep fetching every cycle
Is this a good guess?
What do you lose if you guessed incorrectly?
• ~20% of the instruction mix is control flow
– ~50 % of “forward” control flow (i.e., if-then-else) is taken – ~90% of “backward” control flow (i.e., loop back) is taken
Overall, typically ~70% taken and ~30% not taken [Lee and Smith, 1984]
• Expect “nextPC = PC+4” ~86% of the time, but what about the remaining 14%?
28

Guessing NextPC = PC + 4
• Always predict the next sequential instruction is the next instruction to be executed
• This is a form of next fetch address prediction (and branch prediction)
• How can you make this more effective?
• Idea: Maximize the chances that the next sequential
instruction is the next instruction to be executed
– Software: Lay out the control flow graph such that the “likely next instruction” is on the not-taken path of a branch
• Profile guided code positioningPettis & Hansen, PLDI 1990.
– Hardware: ??? (how can you do this in hardware…) • Cache traces of executed instructionsTrace cache
29

Next Topic Instruction Level Parallelism

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 parallel assembly computer architecture graph software Computer Architecture
30 $