Computer Architecture
Course code: 0521292B 08. Exploiting ILP
Jianhua Li
College of Computer and Information Hefei University of Technology
slides are adapted from CA course of wisc, princeton, mit, berkeley, etc.
The uses of the slides of this course are for educa/onal purposes only and should be
used only in conjunc/on with the textbook. Deriva/ves of the slides must
acknowledge the copyright no/ces of this and the originals.
(Instruction-Level Parallelism)
:
+:
CPI
CPI:
CPI = CPI + + + CPI
IPC:Instructions per Cycle (Basic Block)
:
:5~7 😕
:
ILP
:
for (i=1; i<=500; i++) a[i]=a[i]+s; ; (loop unrolling) SIMD : : ; ; : : : : : DADDUBEQZLW L1 :R2R3R4 R2L1 R10(R2)R2 ! : ( ) : DADDUBEQZDSUBU L1 :…ORR1R2R3 R4L1 R1R5R6R7R1R8R1 ( ) ! : DADDUBEQZDSUBUDADDU skip: ORR1R2R3 R12Skipnext R4R5R6 R5R4R9 R7R8R9 skipR4DSUBU DSUBU : : :: ()3store(S.D)2load(L.D)1load(L.D)store(S.D)0 :MIPS : :1 load:1 () : 1.MIPS for (i=1; i<=1000; i++) x[i] = x[i] + s;: MIPS:: R1 8(R2) F2:s Loop: L.D ADD.DS.D DADDIU BNE:F0, 0(R1) F4, F0, F2 F4, 0(R1)R1, R1, #-8 R1, R2, LoopR1:( ) :Loop: L.D F0, 0(R1) ()ADD.D F4, F0, F2 ()()S.D F4, 0(R1) DADDIU R1, R1, # -8 ()BNE R1,R2,Loop()1 23 4 5 67 89 10 105 : DADDIUL.DADD.D; S.D; ; Loop: L.D F0,0(R1) ()Loop: L.D F0, 0(R1)DADDIU R1,R1,#-8ADD.D F4, F0, F2()BNE R1,R2,LoopS.D F48(R1)ADD.D()()F4,F0,F2S.D F4, 0(R1) DADDIU R1,R1,#-8()BNE R1,R2,Loop ()Loop: L.D F0, 0(R1) 1 DADDIU R1, R1, #-8 2 ADD.D F4, F0, F2 3 () 4 BNE R1, Loop 5 S.D F48(R1) 6 106,5 1 :L.DADD.DS.D3 3 DADDIUBEN3 : 2.44 R1324 🙁 ): F0F4:1 F2:s F6F8:2 F10F12:3 F14F16:4 : Loop:L.D F0,0(R1) 1 () 2 ADD.D F4,F0,F2 3 () 4 () 5 S.D F4, 0(R1) 6 L.D F6,-8(R1) 7 () 8 ADD.D F8,F6,F2 9 () 10 () 11 S.D F8, -8(R1) 12 L.D F10,-16(R1) 13 () 14ADD.D F12,F10,F2 15 () 16 () 17 S.D F12,-16(R1) 18 L.D F14,-24(R1) 19 () 20 ADD.D F16,F14,F2 21 () 22 () 23 S.D F16,-24(R1) 24 DADDIU R1,R1,#-32 25 () 26 BNE R1,R2,Loop 27 () 28 : 28 44 28/4=7In conclusion, its not efficient. 10 : 1414 :Loop: L.DL.D F6,-8(R1) 2 L.D F10,-16(R1) 3 L.D F14,-24(R1) 4 ADD.D F4,F0,F2 5 ADD.D F8,F6,F2 6 ADD.D F12,F10,F2 7 ADD.D F16,F14,F2 8 S.D F4,0(R1) 9 S.D F8,-8(R1) 10 DADDIU R1,R1,#-32 12 S.D F12,16(R1) 11 BNE R1,R2,Loop 13 S.D F16,8(R1) 14F0,0(R1) 1 : 14 14/4=3.5 : : : , : ( ); () : :DIV.D F4F0F2 SUB.D F10F4F6 ADD.D F12F6F14 SUB.DDIV.DF4; ADD.D ; : IDOut-of-Order EXE () 5 : (IssueIS) (Read OperandsRO) IS RO 5WARWAW DIV.D F10, F0, F2 SUB.D F10, F4, F6 ADD.D F6, F8, F14 Tomasulo ; ; : : : i()i i: i; i : i ?Smith, et al. ISCA 85 :(WAR) :DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14 : CDC 6600 CDC 6600 FP Mult FP Mult FP Divide FP AddIntegerSCOREBOARD RegistersFunctional Units => WAR, WAW?
WAR:
WAW: ;
=> ;
;
IDEXWB;
1. Issuedecode instructions & check for structural hazards.
Ifa functional unit for the instruction is free andno other active instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared (in-order issue).
2. Read operandswait until no data hazards, then read operands.
A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.
3. Executionoperate on operands (EX)
The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.
4. Write resultfinish execution (WB)
Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.
Example:
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14
1. Instruction status
which of 4 steps the instruction is in
2. Functional unit statusIndicates the state of the functional
unit (FU).
Busy Indicates whether the unit is busy or not
Op Operation to perform in the unit (e.g., + or )
Fi Destination register
Fj, Fk Source-register numbers
Qj, Qk Functional units producing source registers Fj, Fk Rj, Rk Flags indicating when Fj, Fk are ready
3. Register result status
Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write
that register
0
Instruction status Instruction j k
Read ExecutWiornite Issue operancdosmpleRtesult
ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2 Fi Fj Fk
FU for Fj U for kFj? Fk?
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
Busy Op
Qj Qk
Rj Rk
No No No No No
Clock
F0 F2 F4 F6 F8 F10 F12 F30
FU
1
Instruction status Instruction j k
Read ExecutWionrite Issue operancdosmpleRtesult
1
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Mult1 Mult2 Add Divide
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 F30
1 FU
FU for Fj U for kFj? Fk?
Yes Load F6 R2 Yes No
No
No
No
Integer
2
Instruction status Read ExecutWionrite Instruction j k IssueoperancdosmpleRtesult
LD F6 34+R2
LD F2 45+R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
Issue 2nd LD?
12
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
dest S1 S2
Fi Fj Fk Qj
Busy Op
Yes Load F6 R2 Yes No
No
No
No
Clock F0 F2 F4 F6 F8 F10 F12 F30
2 FU
Integer
3
Instruction status
Read ExecutioWnrite Issue operandcsompletReesult
Instruction LD F6 LD F2
MULTDF0
SUBD F8
DIVD F10 F0 F6
ADDDF6 F8 F2 Functional unit status
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
Clock
3 FU
j k
34+ R2
45+ R3
F2 F4
Issue MULT? FU for jFU for kFj? Fk?
Qk Rj Rk
123
F6 F2
Busy Op
dest S1 S2
Fi Fj Fk Qj
Yes Load F6 R2 Yes No
No
No
No
F0 F2 F4 F6 F8 F10 F12 F30
Integer
4
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No No No No No
Clock F0 F2 F4 F6 F8 F10 F12 F30 4 FU
5
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
Yes Load F2 R3 Yes
No No No No
Clock F0 F2 F4 F6 F8 F10 F12 F30 5 FU
Integer
6
ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 56
6
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
Yes Load F2 R3 Yes Yes Mult F0 F2 F4 Integer No Yes No
No
No
Clock F0 F2 F4 F6 F8 F10 F12 F30 6 FU
Mult1Integer
7
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 567
6
7
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add Divide
Busy Op
FU for Fj U for kFj? Fk? Qk Rj Rk
Yes Load F2 R3 No Yes Mult F0 F2 F4 Integer No Yes No
Yes Sub F8 F6 F2 IntegerYes No No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 F30
7 FU
Read multiply operands?
Mult1Integer Add
8a()
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 567
6
7
8
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
Yes Load F2 R3 No Yes Mult F0 F2 F4 Integer No Yes No
Yes Sub F8 F6 F2 IntegerYes No Yes Div F10 F0 F6 Mult1 No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30 8 FU
Mult1Integer Add Divide
8b()
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6
7
8
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Sub F8 F6 F2
Yes Div F10 F0 F6 Mult1
Yes Yes
Yes Yes
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
8 FU
Mult1 Add Divide
9
ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6
F8 F2
1234 5678 69
79
8
Functional unit status dest S1 S2 FU for Fj U for kFj? Fk?
Read operands for MULT & SUBD? Issue ADDD? TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
No
Yes Mult F0 F2 F4
No
Yes Sub F8 F6 F2
Yes Div F10 F0 F6 Mult1
Yes Yes
Yes Yes
No Yes
Integer 10 Mult1 Mult2
2 Add Divide
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 F30 9 FU
Mult1 Add Divide
10
ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
79
8
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer 9 Mult1 Mult2
1 Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Sub F8 F6 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30 10 FU
Mult1 Add Divide
11
ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11
8
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer 8 Mult1 Mult2
0 Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Sub F8 F6 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
11 FU
Mult1 Add Divide
12
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer 7 Mult1 Mult2
Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
Yes Mult F0 F2 F4 No No
No
No
Yes Div F10 F0 F6 Mult1 No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30 12 FU
Read operands for DIVD?
Mult1 Divide
13
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
13
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer 6 Mult1 Mult2
Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
Yes Yes
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
13 FU
Mult1 Add Divide
14
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
13 14
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer 5 Mult1 Mult2
2 Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
Yes Yes
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
14 FU
Mult1 Add Divide
15
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
13 14
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer 4 Mult1 Mult2
1 Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30 15 FU
Mult1 Add Divide
16
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
13 14 16
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer 3 Mult1 Mult2
0 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
16 FU
Mult1 Add Divide
17
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
13 14 16
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer 2 Mult1 Mult2
Add Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
17 FU
Mult1 Add Divide
WriteresultofADDD?
18
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 69
7 9 11 12 8
13 14 16
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer 1 Mult1 Mult2
Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
18 FU
Mult1 Add Divide
19
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6 9 19
7 9 11 12 8
13 14 16
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer 0 Mult1 Mult2
Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
Yes Mult F0 F2 F4
No
Yes Add F6 F8 F2
Yes Div F10 F0 F6 Mult1
No No
No No
No Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
19 FU
Mult1 Add Divide
20
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6 9 19 20 7 9 11 12 8
13 14 16
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
No
No
Yes Add F6 F8 F2 No No Yes Div F10 F0 F6 Yes Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
20 FU
Add Divide
21
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6 9 19 20 7 9 11 12 8 21
13 14 16
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
No
No
Yes Add F6 F8 F2 No No Yes Div F10 F0 F6 Yes Yes
Clock F0 F2 F4 F6 F8 F10 F12 F30
21 FU
Add Divide
22
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
1234 5678 6 9 19 20 7 9 11 12 8 21
13 14 16 22
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
FU for Fj U for kFj? Fk? Qk Rj Rk
TimeName
Integer Mult1 Mult2 Add
40 Divide
Register result status
dest S1 S2
Fi Fj Fk Qj
Busy Op
No
No
No
No
Yes Div F10 F0 F6 No No
Clock F0 F2 F4 F6 F8 F10 F12 F30
22 FU
Divide
61
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6 9 19 20 7 9 11 12 8 21 61
13 14 16 22
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add
0 Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No
No
No
No
Yes Div F10 F0 F6 No No
Clock F0 F2 F4 F6 F8 F10 F12 F30
61 FU
Divide
62
Instruction status Instruction j k
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6 9 19 20 7 9 11 12 8 21 61 62
13 14 16 22
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Functional unit status
dest S1 S2
Fi Fj Fk Qj
TimeName
Integer Mult1 Mult2 Add
0 Divide
Register result status
FU for Fj U for kFj? Fk? Qk Rj Rk
Busy Op
No No No No No
Clock F0 F2 F4 F6 F8 F10 F12 F30
62 FU
CDC 6600
6600:
/(forwarding);
();
()integer/load
store;
;
WAR;
WAW;
Tomasulo
IBM 360/91CDC 6600 (1966) :
;
IBM 360 CDC 6600
:
IBM360(register
specifiers)CDC 6600;
IBM360CDC 6600
Tomasulo?
Alpha 21264HP 8000MIPS 10000
Pentium IIPowerPC 604,
Tomasulo
& ;
(reservation stations);
:(register renaming) ;
WARWAW;
;
RSFU, (Common Data Bus) FU;
(Load) (Stores) ();
From Memory
Load Buffer
FP Op Queue
From Instruction Unit
FP Registers
6 5 4 3 2 1
1
Tomasulo
Operation Common Bus
3 2 1
To Memory
FP Mul Res. Station
Store Buffer
Data 3 Bus2 1
1
FP Add
Res. Adders Station
2
Multers
Reservation Station
Common Data Bus(CDB)
Tomasulo 1. IssueFP Op Queue
(), &( )
2. Execution(EX)
;(RAW)
;
3. Write result(WB)
; (Not busy);
: + () 64 + 4;
(); (broadcast);
Op(, + or ) Vj, Vk
(Store buffers)V; Qj, Qk()
Busy FU
:
(READY);
tomasuloQj,Qk=0 ready
Register result status (Qi) ;
Tomasulo 0
Instruction status Instruction j
Execution complete
Write Result
k Issue
R2
R3
F4
F2
F6
ADDDF6 F8 Reservation Stations
Time Name BusyOp
Busy Address
LD F6 34+
LD F2 45+
MULTFD0 F2
SUBDF8 F6
DIVDF10 F0
Load1
Load2
Load3
F2
S1 Vj
S2 Vk
RS for j Qj
RS for k Qk
No No No
LD: 2 cycles ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles
No
No
No
No
No
0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2
Register result status Clock
0 FU
F0
F2
F4
F6
F8 F10 F12 F30
Tomasulo 1
Instruction status Instruction j k Issue
Execution Write complete Result
Busy Address
1
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
2 Load1 0 Load2 0 Load3
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeNameBusyOp
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
1 FU
S1 S2 Vj Vk
RSforj RSfork Qj Qk
Yes 34+R2 No
No
No
No
No
No
No
F2 F4
F6 F8
F10 F12 F30
Load1
Tomasulo 2
Instruction status Instruction j k Issue
Execution Write complete Result
Busy Address
1 2
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
1 Load1 2 Load2 0 Load3
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeName BusyOp
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
2 FU
S1 S2 Vj Vk
RSforj RSfork Qj Qk
Yes 34+R2 Yes 45+R3 No
No
No
No
No
No
: CDC6600, loads !
F2 F4
F6 F8
F10 F12 F30
Load2 Load1
Tomasulo 3
Instruction status Instruction j k Issue
Execution W rite complete Result
Busy Address
13 2
3
LD F6
LD F2
MULTFD0
SUBDF 8
34+ R2
45+ R3
F2 F4
F6 F2
0 Load1 1 Load2 0 Load3
RS for j
Qj Qk
DIVDF10 F0 F6
A D D DF 6 F 8 F2
Reservation Stations
TimeNameBusyOp
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
3 FU
S1 S2 Vj Vk
RS for k
Yes 34+R2 Yes 45+R3 No
No
No
No
Yes MULTD R(F4) Load2 No
F2 F4
F6 F8
F10 F12 F30
Mult1 Load2 Load1
; MULT ( )
Load1Load1?
Tomasulo 4
Instruction status Instruction j k Issue
Execution Write complete Result
Busy Address
134 24
3
4
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
0 Load1 0 Load2 0 Load3
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeNameBusyOp
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
4 FU
S1 S2 Vj Vk
RSforj RSfork Qj Qk
Yes SUBDM(34+R2) Load2 No
No
Yes MULTD R(F4) Load2
No
F2 F4
F6 F8
F10 F12 F30
Mult1 Load2 M(34+R2) Add1
Load2Load2?
No
Yes 45+R3 No
Tomasulo 5
Instruction status Execution Write Instruction j k Issue complete Result
LDF634+R2
LDF245+R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Busy Address
134 245 3
4
5
Reservation Stations
S1 S2 TimeNameBusyOp Vj Vk
RS for j Qj
RS for k Qk
Load1 Load2 Load3
No No No
Yes SUBDM(34+R2) M(45+R3)
No
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
2 Add1 0 Add2 Add3
10 Mult1 0 Mult2
Register result status
Clock F0 F2 F4
5 FU
F6
F8 F10 F12 F30
Mult1 M(45+R3) M(34+R2) Add1 Mult2
Tomasulo 6
Instruction status Instruction j
Execution complete
Write Result
k Issue
R2
R3
F4
F2
F6
ADDDF6 F8 Reservation Stations
Time Name BusyOp
Busy Address
134 245 3
4
5 6
LD F6 34+
LD F2 45+
MULTFD0 F2
SUBDF8 F6
DIVDF10 F0
Load1
Load2
Load3
RS for k Qk
F2
S1 Vj
S2 Vk
RS for j Qj
No No No
Yes SUBDM(34+R2) M(45+R3)
Yes ADDD M(45+R3) Add1 No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
1 Add1 0 Add2 Add3
9 Mult1 0 Mult2
Register result status Clock
6 FU
ADDD
F0
F2
F4
F6
F8 F10 F12 F30
Mult1 M(45+R3) Add2 Add1 Mult2
Tomasulo 7
Instruction status Execution
Instruction j
LD F6 34+
LD F2 45+ MULTFD0 F2 SUBDF8 F6 F2 DIVDF10 F0 F6 ADDDF6 F8 F2 Reservation Stations
Write Result
k Issue complete
R2
R3
F4
Busy Address
134 245 3
47
5 6
S1 TimeNameBusyOp Vj
S2 Vk
RS for j Qj
RS for k Qk
Load1
Load2
Load3
No No No
Yes SUBDM(34+R2) M(45+R3)
Yes ADDD M(45+R3) Add1 No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
0 Add1 0 Add2 Add3
8 Mult1 0 Mult2
Register result status Clock
7 FU
F0 F2
F4
F6
F8 F10 F12 F30
Mult1 M(45+R3) Add2 Add1 Mult2
Add1Add1?
Tomasulo 8
Instruction status Instruction j k Issue
LDF634+R2
LDF245+R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeName BusyOp
0 Add1 2 Add2 Add3
7 Mult1 0 Mult2
Register result status Clock F0
8 FU
Execution Write complete Result
Busy Address
134 245 3 478 5
6
S1 S2 Vj Vk
RS for j Qj
RS for k Qk
F2 F4
F6
F8 F10 F12 F30
Load1 Load2 Load3
No No No
No
Yes ADDDM()-M() M(45+R3)
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
Mult1M(45+R3) Add2 M()-M() Mult2
Tomasulo 9
Instruction j k
Issue complete Result
Busy Address Load1 No
Load2 No Load3 No
RS for k Qk
134 245 3 478 5
6
LD F6
LD F2
MULTFD0
SUBDF8
34+ R2
45+ R3
F2 F4
F6 F2
DIVDF10 F0 F6
ADDFD6 F8 F2 Reservation Stations
S2 Vk
S1 TimeNameBusyOp Vj
RS for j Qj
0 Add1 No
1 Add2 Yes ADDDM()-M() M(45+R3)
Add3 No
6 Mult1Yes MULTD M(45+R3)R(F4)
0 Mult2Yes DIVD M(34+R2)Mult1 Register result status
Clock F0 F2 F4 F6 9 FU
F8 F10 F12 F30
Mult1M(45+R3) Add2 M()-M() Mult2
Tomasulo 10
Instruction status Instruction j
Execution complete
Write Result
LD F6 34+
LD F2 45+
MULTFD0 F2
SUBDF8 F6
DIVDF10 F0
ADDDF6 F8
Load1
Load2
Load3
Reservation Stations
TimeNameBusyOp
0 Add1 0 Add2 Add3
5 Mult1 0 Mult2
Register result status Clock F0
10 FU
S1 Vj
S2 Vk
RS for j Qj
RS for k Qk
k Issue
R2
R3
F4
F2
F6
F2
Busy Address
134 245 3 478 5
6 10
No No No
No
Yes ADDDM()-M() M(45+R3)
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
F2
F4
F6
F8 F10 F12 F30
Mult1 M(45+R3) Add2 M()-M() Mult2
Add2Add2?
Tomasulo 11
Instruction status Instruction j k Issue
LD F6 34+ R2
LD F2 45+ R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeNameBusyOp
0 Add1 0 Add2 Add3
4 Mult1 0 Mult2
Register result status Clock F0
11 FU
Execution Write complete Result
Busy Address
134 245 3 478 5
6 10 11
S1 S2 Vj Vk
RS for j Qj
RS for k Qk
Load1 Load2 Load3
No No No
No
No
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
F2 F4
F6
F8 F10 F12 F30
Mult1 M(45+R3) (M-M)+M()M()-M()Mult2
ADDD
Tomasulo 12
Instruction status Instruction j k Issue
LD F6 34+ R2
LD F2 45+ R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeNameBusyOp
0 Add1 0 Add2 Add3
3 Mult1 0 Mult2
Register result status Clock F0
12 FU
Execution Write complete Result
Busy Address
134 245 3 467 5
6 10 11
S1 S2 Vj Vk
RS for j Qj
RS for k Qk
Load1
Load2
Load3
No No No
No
No
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
F2 F4
F6
F8 F10 F12 F30
Mult1 M(45+R3) (M-M)+M()M()-M() Mult2
:
Tomasulo 13
Instruction status Execution Write Instruction j k Issue complete Result
LDF634+R2
LDF245+R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
Busy Address
134 245 3 478 5
6 10 11
S1 S2 TimeNameBusyOp Vj Vk
RS for j Qj
RS for k Qk
Load1 Load2 Load3
No No No
No
No
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
0 Add1 0 Add2 Add3
2 Mult1 0 Mult2
Register result status Clock F0
13 FU
F2 F4
F6
F8 F10 F12 F30
Mult1M(45+R3) (M-M)+M()M()-M() Mult2
Tomasulo 14
Instruction status Instruction j k Issue
LDF634+R2
LDF245+R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2
Reservation Stations
TimeName BusyOp
0 Add1
0 Add2 Add3
1 Mult1
0 Mult2
Register result status Clock F0
14 FU
Execution Write complete Result
Busy Address
134 245 3 478 5
6 10 11
S1 S2 Vj Vk
RS for j Qj
RS for k Qk
F2 F4
F6
F8 F10 F12 F30
Load1 Load2 Load3
No No No
No
No
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
Mult1 M(45+R3) (M-M)+M()M()-M()Mult2
Tomasulo 15
Instruction status Execution Write Instruction j k Issue complete Result
LD F6 34+ R2
LD F2 45+ R3
MULTFD0 F2 F4
SUBDF8 F6 F2
DIVDF10 F0 F6
ADDDF6 F8 F2 Reservation Stations
Busy Address
134 245 3 15 478 5
6 10 11
S1 S2 TimeNameBusyOp Vj Vk
RS for j Qj
RS for k Qk
Load1
Load2
Load3
No No No
No
No
No
Yes MULTDM(45+R3) R(F4)
Yes DIVD M(34+R2) Mult1
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
15 FU
F2 F4
F6
F8 F10 F12 F30
Mult1 M(45+R3) (M-M)+M()M()-M() Mult2
Mult1 completing; what is waiting for it?
Tomasulo 16
Instruction status Execution
Instruction j
LD F6 34+
LD F2 45+ MULTFD0 F2 SUBDF8 F6 F2 DIVDF10 F0 F6 ADDDF6 F8 F2 Reservation Stations
Write Result
k Issue complete
R2
R3
F4
Busy Address
134 245 3 15 16 478 5
6 10 11
S1 TimeNameBusyOp Vj
S2 Vk
RS for j Qj
RS for k Qk
Load1
Load2
Load3
No No No
No
No
No
No
Yes DIVD M*F4 M(34+R2)
0 Add1 0 Add2 Add3
0 Mult1 40 Mult2
Register result status Clock F0
16 FU
F2
F4
F6
F8 F10 F12 F30
M*F4 M(45+R3) (M-M)+M()M()-M() Mult2
:
Tomasulo 55
Instruction status Instruction j
Execution complete
Write Result
k Issue
R2
R3
F4
F2
F6
ADDDF6 F8 Reservation Stations
Time Name BusyOp
0 Add1 0 Add2 Add3
0 Mult1
1 Mult2
Register result status Clock F0
55 FU
Busy Address
134 245 3 15 16 478 5
6 10 11
LD F6 34+
LD F2 45+
MULTFD0 F2
SUBDF8 F6
DIVDF10 F0
Load1 Load2 Load3
F2
S1 Vj
S2 Vk
RS for j Qj
RS for k Qk
No No No
No
No
No
No
Yes DIVD M*F4 M(34+R2)
F2
F4
F6
F8 F10 F12 F30
M*F4 M(45+R3) (M-M)+M()M()-M() Mult2
Tomasulo 56
Instruction status Execution
Instruction j
LD F6 34+
LD F2 45+ MULTFD0 F2 SUBDF8 F6 F2 DIVDF10 F0 F6 ADDDF6 F8 F2 Reservation Stations
Write Result
k Issue complete
R2
R3
F4
Busy Address
134 245 3 15 16 478 5 56
6 10 11
S1 TimeNameBusyOp Vj
S2 Vk
RS for j Qj
RS for k Qk
Load1
Load2
Load3
No No No
No
No
No
No
Yes DIVD M*F4 M(34+R2)
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
56 FU
Mult2
F2
F4
F6
F8 F10 F12 F30
M*F4 M(45+R3) (M-M)+M()M()-M() Mult2
Tomasulo 57
Instruction status Execution
Instruction j
LD F6 34+
LD F2 45+ MULTFD0 F2 SUBDF8 F6 F2 DIVDF10 F0 F6 ADDDF6 F8 F2 Reservation Stations
Write Result
k Issue complete
R2
R3
F4
Busy Address
134 245 3 15 16 478 5 56 57 6 10 11
Load1 Load2 Load3
No No No
S1 TimeNameBusyOp Vj
S2 Vk
RS for j Qj
RS for k Qk
No
No No No No
0 Add1 0 Add2 Add3
0 Mult1 0 Mult2
Register result status Clock F0
57 FU
F2
F4
F6
F8 F10 F12 F30
M*F4 M(45+R3) (M-M)+M()M()-M() M*F4/M
tomasulo,:
62
Instruction status Instruction j k
LD F6 34+R2
LD F2 45+R3 MULTFD0 F2 F4 SUBDF8 F6 F2 DIVDF10 F0 F6 ADDDF6 F8 F2 Functional unit status
Read ExecutWionrite IssueoperancdosmpleRtesult
1234 5678 6 9 19 20 7 9 11 12 8 21 61 62
13 14 16 22
TimeName
Integer Mult1 Mult2 Add
0 Divide Register result status
dest S1 S2 Fi Fj Fk
FU for Fj U for kFj? Fk?
Busy Op
Qj
Qk Rj Rk
No No No No No
Clock F0 F2 F4 F6 F8 F10 F12 F30 62 FU
CDC6600?
Tomasulo
(6load,3store,3+,2x/) (1load/store,1+,2x,1)
: 14 WAR: WAW: FU :
5
/
Tomasulo
(associative buffer)
CDB ;
(1)!
CDB => FU ;
1 BHT/BPB 2 BTB
3
ILP
(nn)
n n ;
AmdahlCPI
:
; ;
:
;
;
:
;
;
;
()
:
?
?
i+1
i+2
i-1
i
p+1
p+2
BHT(Branch History Table) (Branch Prediction Buffer)
;
BHT
(); 1
BHT1 ()
:
;
:n(n>2) ;
:
:
:
11
10
00
01
:
Step 1:
(ID)BHT
Step 2: BHT BHT:
BHTBHT
:BHTMIPS 5 ?
:SPEC89 4KBHT82%~99%
4KBHT
BHTCache
Correlation Branch Prediction Tournament Branch Prediction
Bi-mode Brach Prediction
Neuron based Branch Prediction Bias-Free Branch Prediction
Check them in ACM Digital Library or IEEE xplore
: 0 :
(Branch- Target BufferBTB)
PC
N
Y
=?
PC
BTB; :
; (key)
;
BTB
IF
BTB ?
ID
?
EX
PC BTB
BTB PC
?
() BTB
PC BTB
()
(speculative execution)
:
ROB(Re-Order Buffer);
(commit)( )
ROB?
()
:
1 ;
2 ; 3 ;
Tomasulo: Tomasulo
: 1 ;
2 ;
() ROB;
CDB
;
:
ROB ;
;
ROB
FP
load/store
6 load 5
4
33 222 111
store
store
load
(CDB)
()
ROB4:
1 :store
;
2 : (loadALU) (store);
3 : ;
4 : ;
()
Tomasulo(RS)
ROB
:
1
;
(r)ROB( b)r ROBb;
ROB ()
()
2
;
CDB
(RAW) 3
ROBCDBCDBROB
;
store:
storeROB; CDBCDB
storeROB;
() 4 ::
(store): ROB ROB ;
store: ;
:
ROBROB ( )
ROB ;
()
:2 1040 MUL.D
L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2
()
MUL.DROB
Busy
Op
Vj
Vk
Qj
Qk
Dest
A
Add1
no
Add2
no
Add3
no
Mult1
no
MUL
Mem[45+ Regs[R2]]
Regs[F4]
#3
Mult2
yes
DIV
Mem[34+Regs[R2]]
#3
#5
1
2
3
no
no
yes
MUL.D F0, F2, F4
L.D F6, 34(R2)
L.D F2, 45(R3)
ROB
Busy
F6
F2
F0
#2Regs[F4]
Value
Mem[34+Regs[R2]]
Mem[45+Regs[R3]]
4
5
6
yes
yes
yes
SUB.D F8, F6, F2 F8
#1-#2
#4+#2
Tomasulo
DIV.D F10, F0, F6 F10
ADD.D F6, F8, F2
F6
F0
F2
F4
F6
F8
F10
F30
ROB
3
6
4
5
Busy
yes
no
no
yes
yes
yes
no
()
ROB;
;
;
:
;
1 2 3 VLIW & EPIC
1234567
I1 I2
I3
1234567
I1 I2
I3
IF
IF
IF
ID
ID
ID
IF
IF
IF
EX
EX
EX
ID
ID
ID
IF
IF
IF
MEM
MEM
MEM
EX
EX
EX
ID
ID
ID
WB
BWB
BWB
MBEM
MEM
MEM
EX
EX
EX
WB
WB B
WB B
MBEM
MEM
MEM
WB
BWB
BWB
IF
ID
EX
MEM
WB
IF
ID
EX
MBEM
WB
IF
ID
EX
MBEM
WB
B
B
:
(Superscalar)<>
()
nn;
Tomasulo
; VLIW/EPIC
;
;
;
VLIW:
Sun UltraSPARCII/III
IBM Power2
Pentium III/4 MIPS R10K Alpha 21264 HP PA 8500 IBM RS64III
Trimediai860 Itanium (IA-64)
()
()
()
VLIW /LIW
EPIC
1
8
:4
1~4
()
:
1
2
MIPS?
::1+ 1loadstore
:(64) (64);
:
:
;
(0~2);
;
:
:
;
;
IF
ID
EX
MEM
WB
IF
ID
EX
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
EX
MEM
1+1
;
loadstore /;
load
load3;
3 ;
2 ;
Tomasulo:
;
:
: ;
1 ;
2 ;
44
: TomasuloMIPS F2
Loop:
L.D F0, 0(R1) ADD.D F4, F0, F2
S.D F4, 0(R1) DADDIU R1,R1#-8
BNE R1,R2,Loop
// F0
// F2
//
// 8(8
)
// R1R2
Loop
:
ALU,
1
1
:1load2 3
3 CDB
CDB
1
L.D F0,0(R1)
1
2
3
4
1
ADD.DF4,F0,F2
1
5
8
L.D
1
S.D F4,0(R1)
2
3
9
ADD.D
1
DADDIU R1,R1,#-8
2
4
5
ALU
1
BNER1,R2,Loop
3
6
DADDIU
2
L.D F0,0(R1)
4
7
8
9
BNE
2
ADD.DF4,F0,F2
4
10
13
L.D
2
S.D F4,0(R1)
5
8
14
ADD.D
2
DADDIU R1,R1,#-8
5
9
10
ALU
2
BNER1,R2,Loop
6
11
DADDIU
3
L.D F0,0(R1)
7
12
13
14
BNE
3
ADD.DF4,F0,F2
7
15
18
L.D
3
S.D F4,0(R1)
8
13
19
ADD.D
3
DADDIU R1,R1,#-8
8
14
15
ALU
3
BNER1,R2,Loop
9
16
DADDIU
::
:
35
IPC = 5/3 = 1.67 /
1615
15/16=0.94 /
ALU
ALU
3 :
52
L.D
VLIW/EPIC
;
100
;
;
VLIW ;
VLIW/EPIC
: VLIW5: VLIW
:
8
1.6
8172.1
885=40 42.5%
VLIW/EPIC
1
2
1
2
/
L.D F0,0(R1)
L.D F6,-8(R1)
L.D F10,-16(R1)
L.D F14,-24(R1)
L.D F18,-32(R1)
ADD.D F4,F0,F2
ADD.DF8,F6,F2
ADD.DF12,F10,F2
ADD.DF16,F14,F2
ADD.DF20,F18,F2
S.D F4,0(R1)
S.D F8,-8(R1)
S.D F12,-16(R1)
S.D F16,24(R1)
DADDIUR1,R1,#-40
S.D F20,8(R1)
BNE R1,R2,Loop
VLIW/EPIC VLIW
;
;
:
Cache
Next Topic
Memory Hierarchy
Reviews
There are no reviews yet.