CPU
Intro to Pipelining
Datapaths
3:
CS 154: Computer Architecture Lecture #13
Winter 2020
Ziad Matni, Ph.D.
Dept. of Computer Science, UCSB
Administrative
Talk next week must attend Tuesday at 5:00 PM
2/26/20
Matni, CS154, Wi20 2
Lecture Outline
Full Single-Cycle Datapaths Pipelining
2/26/20 Matni, CS154, Wi20 3
The Main Control Unit
Control signals derived (i.e. decoded) from instruction
opcode
always read
write
write
sign extend and add
2/26/20 Op[5:0] Matni, CS154, Wi20 4
Full Datapath showing 7 Control Signals
See Fig. 4.16 in book (p.264) for a description of each signal
2/26/20 Matni, CS154, Wi20 5
One Control Unit to Set them All my precious
2/26/20 Matni, CS154, Wi20
6
One Control Unit to Set them All
Lets do some of these examples:
add $t0, $t1, $t2
addi
lw beq jal
my precious
$a0, $v0, 64
$t0, 4($
$a1, $a2,
jlabel
sp
)
blabel
2/26/20
Matni, CS154, Wi20
7
add $t0, $t1, $t2
rd = rs + rt
2/26/20
Matni, CS154, Wi20
rs + rt 8
rs = $t1 code rt = $t2 code
rd = $t0 code
rs rd
rs + rt
rt
RegDst 1 Branch 0 Zero X MemRead 0 MemtoReg 0 MemWrite 0 ALUOp 0010 ALUSrc 0 RegWrite 1
addi
$a0, $v0, 64
rt = rs + immed
2/26/20
Matni, CS154, Wi20
rs + immed 9
rs = $v0 code rt = $a0 code
rs
immed = 64
rt
immed
rs + immed
RegDst 0 Branch 0 Zero X MemRead 0 MemtoReg 0 MemWrite 0 ALUOp 0010 ALUSrc 1 RegWrite 1
lw
$t0, 4($
sp
)
rt = *(rs + immed)
2/26/20
Matni, CS154, Wi20
Value @ (rs+immed) 10
rs = $sp code rt = $t0 code
rs
immed = 4
rt
immed
rs + immed
Value @ (rs+immed)
RegDst 0 Branch 0 Zero X MemRead 1 MemtoReg 1 MemWrite 0 ALUOp 0010 ALUSrc 1 RegWrite 1
beq
$a1, $a2,
Assume in this example that a1 = a2
New address
rs = $a2 code rt = $a1 code
rt
immed = label
rs rt
immed
2/26/20
Matni, CS154, Wi20
11
blabel
immed
New address
rs rt
RegDst 1 Branch 1 Zero 1 MemRead 0 MemtoReg 0 MemWrite 0 ALUOp 0110 ALUSrc 0 RegWrite 0
R-Type Instruction
2/26/20 Matni, CS154, Wi20 12
Load Instruction
2/26/20 Matni, CS154, Wi20 13
Branch-on-Equal Instruction
2/26/20 Matni, CS154, Wi20 14
Reminder: Implementing Jumps
Jump uses word address
Update PC with concatenation of 4 MS bits of old PC,
26-bit jump address, and 00 at the end
Need an extra control signal decoded from opcode
Need to implement a couple of other logic blocks 2/26/20 Matni, CS154, Wi20 15
Jump Instruction
2/26/20 Matni, CS154, Wi20 16
Performance Issues
Longest delay determines clock period Critical path: load instruction
Goes:
Instruction memoryaregister fileaALUadata memorya register file
Not feasible to vary period for different instructions Violates design principle
Making the common case fast
We can/will improve performance by pipelining
2/26/20 Matni, CS154, Wi20 17
Pipelining Analogy
Pipelined laundry: overlapping execution
An example of how parallelism improves performance
18
4 loads speeded up: From 8 hrs to 3.5 hrs Speed-up factor: 2.3
But for infinite loads:
Speed-up factor 4 = number of stages
2/26/20 Matni, CS154, Wi20
Pipelining Analogy
Pipelined laundry: overlapping execution
An example of how parallelism improves throughput performance
4 loads speeded up: From 8 hrs to 3.5 hrs Speed-up factor: 2.3
But for infinite loads:
Speed-up factor 4 = number of stages
2/26/20 Matni, CS154, Wi20 19
MIPS Pipeline
Five stages,
1. IF:
2. ID:
3. EX:
4. MEM:
5. WB:
one step per stage
Instruction fetch from memory Instruction decode & register read Execute operation or calculate address Access memory operand
Write result back to register
2/26/20
Matni, CS154, Wi20 20
Pipeline Performance
Assume time for stages is
100ps for register read or write 200ps for other stages
Compare pipelined datapath with single-cycle datapath
2/26/20 Matni, CS154, Wi20 21
Tc = 800 ps
Tc = 200 ps
Comparison of Per-Instruction Time
22
Improvement
In the previous example, per-instruction improvement was 4x 800psto200ps
But total execution time went from 2400 ps to 1400 ps (~1.7x imp.) Thats because were only looking at 3 instructions
What if we looked at 1,000,003 instructions?
Total execution time = 1,000,000 x 200 ps + 1400 ps = 200,001,400 ps In non-pipelined, total time = 1,000,000 x 800 ps + 2400 ps = 800,002,400 ps
Improvement = 800,002,400 ps 4.00 200,001,400 ps
2/26/20
Matni, CS154, Wi20 23
About Pipeline Speedup
If all stages are balanced, i.e. all take the same time Time between instructions (pipelined)
= Time between instructions (non-pipelined) / # of stages If not balanced, speedup will be less
Speedup is due to increased throughput,
but instruction latency does not change
2/26/20 Matni, CS154, Wi20 24
MIPS vs Others Pipelining
MIPS (and RISC-types in general) simplification advantages:
All instructions are the same length (32 bits)
x86 has variable length instructions (8 bits to 120 bits)
MIPS has only 3 instruction formats (R, I, J) rs fields all in the same place
x86 requires extra pipelines b/c they dont
Memory ops only appear in load/store
x86 requires extra pipelines b/c they dont
2/26/20 Matni, CS154, Wi20 25
YOUR TO-DOs for the Week
Lab 6 due soon
2/26/20 Matni, CS154, Wi20 26
2/26/20 Matni, CS154, Wi20 27
Reviews
There are no reviews yet.