[SOLVED] CS代考计算机代写 computer architecture compiler scheme mips RISC-V algorithm Lecture 6:

30 $

File Name: CS代考计算机代写_computer_architecture_compiler_scheme_mips_RISC-V_algorithm_Lecture_6:.zip
File Size: 885.48 KB

SKU: 3300483474 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Lecture 6:
Arithmetic 2/3
John Owens
Introduction to Computer Architecture UC Davis EEC 170, Winter 2021

Cascading Adders
▪ Cascade Full Adders to make multibit adder:
▪ A+B=S
A B
Cin
A B
Cin
A B
Cin
A B
Cin
Cout Sum
Cout Sum
Cout Sum
Cout Sum
A3
Full Adder
B3
A2
S3
Full Adder
B2
S2
A1
Full Adder
B1
S1
A0
Full Adder
B0
S0
40
UC Davis EEC 170, Winter 2021 / © John Owens

But What About Performance? ▪ Critical path of one bitslice is CP
▪ Critical path of n-bit rippled-carry adder is n*CP
– Thought experiment:
– @ 7 nm: FO4 is ~2.5 ps
– 2 gate delays * 64b = 320 ps
– 4 GHz clock period is 250 ps
▪ Design Trick:
– Throw hardware at it
A0 B0
A1 B1
A2 B2
A3 B3
CarryIn0
1-bit ALU
CarryOut0
Result0
Result1
Result2
Result3
CarryIn1
1-bit ALU
CarryOut1
CarryIn2
1-bit ALU
CarryOut2 CarryIn3
CarryOut3
41
UC Davis EEC 170, Winter 2021 / © John Owens
1-bit ALU

Truth Table for Adder Bit Slice ▪ 3 inputs (A, B, Cin); 2 outputs (Sum, Cout)
A
B
Cin
Sum
Cout
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0=Cin
0
1
1
0
1=Cin
1
0
0
1
0=Cin
1
0
1
0
1=Cin
1
1
0
0
1
1
1
1
1
1
42 UC Davis EEC 170, Winter 2021 / © John Owens

A0 B0
A1 B1
A2 B2
A3 B3
Carry Look Ahead (Design trick: peek)
C0 = Cin
S
S
S
S
C4 = . . .
G
P
G = A and B P = A xor B WHY are these interesting?
43
UC Davis EEC 170, Winter 2021 / © John Owens
A B 0 0
0 1
1 0
1 1
Cout
0 “kill”
Cin “propagate” Cin “propagate” 1 “generate”
G P
G P
C2 = G1 + G0 · P1 + C0 · P0 · P1
C1 = G0 + C0 · P0
G P
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G P

CLA vs. Ripple
C0 = Cin
S
G = A and B P = A xor B
A0 B0
CarryIn0 A0
Result0
Result1
Result2
Result3
G P
B0 A1 A1
B1
B1
A2 B2
1-bit ALU
C1 = G0 + C0 · P0
C2 = G1 + G0 · P1 + C0 · P0 · P1
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G
P
CarryOut0 CarryIn1
S
1-bit ALU
G P
A2
A3 B3
CarryOut1 CarryIn2
S
1-bit ALU
G P
B2
A3 B3
CarryOut2 CarryIn3
1-bit ALU
CarryOut3
G P
S
C4 = . . .
44
UC Davis EEC 170, Winter 2021 / © John Owens

Cascaded Carry Look-ahead (16-bit)
C
L C0
▪ Abstraction!
▪ Good exercise to work
A
4-bit Adder
4-bit Adder
4-bit Adder
G0 P0
C1 = G0 + C0 · P0
out what G and P are for a 4-bit adder
C2 = G1 + G0 · P1 + C0 · P0 · P1
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G P
C4 = . . .
45
UC Davis EEC 170, Winter 2021 / © John Owens

Design Trick: Guess (or “Precompute”)
CP(2n) = 2*CP(n)
n-bit adder
n-bit adder
CP(2n) = CP(n) + CP(mux)
n-bit adder
1
n-bit adder
0
n-bit adder
Cout
Carry-select adder
46
UC Davis EEC 170, Winter 2021 / © John Owens

Carry Skip Adder: reduce worst case delay
B A4 B A0
P3 S P3 S
4-bit Ripple Adder
4-bit Ripple Adder
P2
P1
P0
P2
P1
P0
▪ Just speed up the slowest case for each block
▪ Exercise: optimal design uses variable block sizes (why?)
47 UC Davis EEC 170, Winter 2021 / © John Owens

Adder Lessons
▪ Reuse hardware if possible
– +/- reuse is compelling argument for 2’s complement
▪ For higher performance:
– Look for critical path, optimize for it Reorganize equations [propagate/generate / carry lookahead]
– Precompute [carry save]
– Reduce worst-case delay [carry skip]
48 UC Davis EEC 170, Winter 2021 / © John Owens

Multiply (unsigned)
▪ Paper and pencil example (unsigned):
Multiplicand 1000 Multiplier x1001 1000
0000 0000
1000 Product 01001000
▪ m bits x n bits = m+n bit product
▪ Binary makes it easy:
– 0 => place 0 ( 0 x multiplicand )
– 1 => place a copy ( 1 x multiplicand )
▪ 4 versions of multiply hardware & algorithm (see book):
– successive refinement
49 UC Davis EEC 170, Winter 2021 / © John Owens
§3.3 Multiplication

m bits x n bits = m+n bit product
How do we store a 128b result from a 64b x 64b multiply?
50 UC Davis EEC 170, Winter 2021 / © John Owens

Unsigned Combinational Multiplier
0000
A3 A2 A1 A0
A3 A2
A3 A2
A3 A2
P7 P6 P5 P4 P3 P2 P1 P0
B0 B1
B2 B3
A1
A0
A1
A0
A1
A0
51 UC Davis EEC 170, Winter 2021 / © John Owens

Unsigned Combinational Multiplier
▪ ▪

At each stage shift A left (multiply it by 2)
Use next bit of B to determine whether to add in shifted multiplicand (Stage i A3 accumulates A * 2i if Bi == 1)
A3 A2
0000
A3 A2 A1 A0 A2
B0 B1
B2 B3
A1
A0
A1
A0
Accumulate 2n bit partial product at each stage
A3 A2
A1
A0
▪ Q: How much
hardware for 32 bit multiplier? Critical path?
P7 P6 P5 P4 P3 P2 P1 P0
52 UC Davis EEC 170, Winter 2021 / © John Owens

Unsigned shift-add multiplier (version 1)
▪ 64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg
Multiplicand
64 bits
Shift Left
Multiplier
32 bits
Control
Shift Right
64-bit ALU
Product
64 bits
Write
Multiplier = datapath + control
53
UC Davis EEC 170, Winter 2021 / © John Owens

Multiply Algorithm V1
Product
0000 0000 0011
1:0000 0010 0011 2:0000 0010 0011 3:0000 0010 0001 1:0000 0110 0001 2:0000 0110 0001 3:0000 0110 0000
0000 0110 0000
Start
1. Test Multiplier0
Multiplier Multiplicand 0000 0010
Multiplier0 = 1
Multiplier0 = 0
0000 0010
0000 0100
0000 0100 0000 0100 0000 1000 0000 1000
0000 1000
1a. Add multiplicand to product & place the result in Product register
54
UC Davis EEC 170, Winter 2021 / © John Owens
2. Shift the Multiplicand register left 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No: < 32 repetitionsYes: 32 repetitions DoneHow can we make this more efficient? ▪ Remember original combinational multiplier:0000A3 A2 A1 A0A3 A2A3 A2A3 A2P7 P6 P5 P4 P3 P2 P1 P0B0 B1B2 B3 A1A0A1A0A1A055 UC Davis EEC 170, Winter 2021 / © John Owens Simply warp to let product move right…0000A3 A2 A1 A0A3 A2 A1 A0A3 A2 A1 A0A3 A2 A1 A0B0B1B2B3P7 P6 P5 P4 P3 P2 P1 P0▪ Multiplicand stays still and product moves right56 UC Davis EEC 170, Winter 2021 / © John Owens Multiply Hardware Version 3▪ 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg)- Your book has details on thisMultiplicand 32 bits32-bit ALUProduct (Multiplier) 64 bitsShift RightWriteControl57UC Davis EEC 170, Winter 2021 / © John Owens Faster Multiplier▪Uses multiple adders- Cost/performance tradeoff ▪Can be pipelined- Several multiplications performed in parallel58 UC Davis EEC 170, Winter 2021 / © John Owens Motivation for Booth’s Algorithm▪ ▪Example 2 x 6 = 0010 x 0110:0010 x 0110+ + + +0000 shift (0 in multiplier) 0010 add (1 in multiplier) 0010 add (1 in multiplier)0000 shift (0 in multiplier) 000110059 UC Davis EEC 170, Winter 2021 / © John Owens Motivation for Booth’s Algorithm▪ ALU with add or subtract can get same result in more than one way:▪ 6 = 4 + 2 = -2 + 80110 = -00010 + 01000 = 11110 + 01000▪ For example:0010 (2 [multiplier]) x 0110 (6 [multiplicand])0000 – 00100000 + 001000001100shift (0 in multiplier)sub (first 1 in multpl.)shift (mid string of 1s)add (prior step had last 1)60 UC Davis EEC 170, Winter 2021 / © John Owens Booth’s Algorithm end of run ▪ Current Bit1 0 1 1 0 1 0 0011110Bit to the Right ExplanationBegins run of 1sMiddle of run of 1sEnd of run of 1sExample Op 0001111000 sub 0001111000 none 0001111000 add 0001111000 noneMiddle of run of 0s▪ Originally for speed (when shift was faster than add)middle of run▪ Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one▪ Handles two’s complement!beginning of run61 UC Davis EEC 170, Winter 2021 / © John Owens Booth’s Example (2 x 7)Big picture: Treat 7 as 8 – 1Operation0. initial value1a. P = P – m 1b.2.3.4a. 4b.Multiplicand 0010Product0000 0111 0 1110 0111 0 1111 0011 1 1111 1001 1 1111 1100 1 0001 1100 1 0000 1110 0next?10 -> sub
shift P (sign ext) 11 -> nop, shift 11 -> nop, shift 01 -> add
shift done
Blue + red = multiplier
(red is 2 bits to compare); Green = arithmetic ops; Black = product
62 UC Davis EEC 170, Winter 2021 / © John Owens
0010
0010
0010
0010
0010
0010
+ 1110->
+ 0010->

Booth’s Example (2 x -3)
Operation
0. initial value
1a. P = P -m 1b.
2a.
2b.
3a.
3b. 4a.
4b.
Blue + red = multiplier (red is 2 bits to compare); Green = arithmetic ops; Black = product
Multiplicand 0010
1110 + 1110 0010
0010
+ 1110
0010 0010
Product
0000 1101 0 1110 1101 0
01
next?
10 -> sub
shift P (sign ext) 01 -> add + 0010
1111 011
0001 011 0000 1011 0
01 shift P
10 -> sub shift
1110 1011 0
1111 0101 1 11 -> nop
0010
1111 0101 1 1111 1010 1
shift done
63
UC Davis EEC 170, Winter 2021 / © John Owens

Radix-4 Modified Booth’s Algorithm
Current Bits
00 01 10 11
00 01 10 11
Bit to the Right
0 0 0 0
1 1 1 1
Explanation Example Recode
Middle of zeros Single one Begins run of 1s Begins run of 1s
Ends run of 1s Ends run of 1s Isolated 0 Middle of run
00 00 00 00 00 00 00 00 01 00 00 01 11 10 00 00 01 11 11 00
00 00 11 11 00 00 01 11 11 00 00 11 10 11 00 00 11 11 11 00
0
1 -2 -1
1
2 -1 0
Same insight as one-bit
Allows multiplication 2 bits at a time.
Booth’s, simply adjust for alignment of 2 bits.
64
UC Davis EEC 170, Winter 2021 / © John Owens

RISC-V Multiplication Support
▪ Four multiply instructions: – mul: multiply
– Gives the lower 64 bits of the product – mulh: multiply high
– Gives the upper 64 bits of the product, assuming the operands are signed
– mulhu: multiply high unsigned
– Gives the upper 64 bits of the product, assuming the operands are unsigned
– mulhsu: multiply high signed/unsigned
– Gives the upper 64 bits of the product, assuming one operand is signed and the other unsigned
– Use mulh result to check for 64-bit overflow
65 UC Davis EEC 170, Winter 2021 / © John Owens

RISC-V Support for multiply
▪ “If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order andrdhcannot be the same asrs1or
rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.”
66 UC Davis EEC 170, Winter 2021 / © John Owens

Multiplication Summary
▪ Iterative algorithm ▪ Design techniques:
– Analyze hardware—what’s not in use?
– Spend more hardware to get higher performance
– Booth’s Algorithm—more general (2’s complement)
– Booth’s Algorithm—recoding is powerful technique to think about problem in a different way
– Booth’s Algorithm—more bits at once gives higher performance
67 UC Davis EEC 170, Winter 2021 / © John Owens

RISC-V Support for divide
▪ 4 instructions:
-{div, divu, rem, remu} rd, rs1, rs2
– – – –
▪ “If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr,
rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides.”
▪ Overflow and division-by-zero don’t produce errors
– Just return defined results
– Faster for the common case of no error
div: rs1 / rs2, treat as signed
divu: rs1 / rs2, treat as unsigned
rem: rs1 mod rs2, treat as signed
remu: rs1 mod rs2, treat as unsigned
72 UC Davis EEC 170, Winter 2021 / © John Owens

MIPS Support for multiply/divide
▪ Rather than target the general-purpose registers:
– –
hi
mul placed its output into two specialhiandloregisters div placed its divide output intoloand its rem output into

general-purpose register)
MIPS providedmfloandmfhiinstructions (destination:
73 UC Davis EEC 170, Winter 2021 / © John Owens

Divide: Paper & Pencil
1001 Quotient Divisor 1000 1001010 Dividend
–100100
101
1010
–1000
10 Remainder
(or Modulo result)

▪ ▪
See how big a number can be subtracted, creating quotient bit on each step
– Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder 3 versions of divide, successive refinement
74 UC Davis EEC 170, Winter 2021 / © John Owens

Divide Hardware Version 1
▪ 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg
Divisor
64 bits
Shift Right
64-bit ALU
Remainder
64 bits
Write
Quotient
32 bits
Control
Shift Left
75
UC Davis EEC 170, Winter 2021 / © John Owens

Divide Algorithm V1
▪ Takes n+1 steps for n-bit Quotient & Rem.
Start: Place Dividend in Remainder
▪ Remainder Quotient 0000 0111 0000
Divisor 0010 0000
Remainder ≥ 0
1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register.
Test Remainder
Remainder < 0 2a. Shift the Quotient register to the left setting the new rightmostbit to 1.2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, settingthe new least significant bit to 0.“restoring” division 3. Shift the Divisor register right1 bit.Yes: n+1 repetitions (n = 4 here)Donen+1 repetition?No: < n+1 repetitions76UC Davis EEC 170, Winter 2021 / © John Owens Divide Algorithm I example (7 / 2)Remainder QuotientDivisor 0000 0111 1:1110 0111 000002:0000 0111 0000 3:0000 0111 0000 1:1111 0111 00002:0000 0111 000000000010 0000 0010 00000010 00000001 00000001 00000001 00000000 10000000 10000000 10000000 01000000 01000000 01000000 00100000 00100000 00100000 0001Answer: Quotient = 3 Remainder = 100 0 000 03:0000 0111 000 1:11111111 0002:0000 0111 0000 0003:0000 0111 00000 1:0000 0011 00000 0001 2:0000 0011 03:0000 0011 00001 0001 2:0000 0001 000113:0000 0001 000111:0000 0001 077UC Davis EEC 170, Winter 2021 / © John Owens Divide Hardware Version 3▪ 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg)Divisor32 bits32-bit ALU“HI” “LO”Shift LeftWriteRemainder(Quotient)Control64 bits78UC Davis EEC 170, Winter 2021 / © John Owens Final Multiply / Divide Hardware Multiplicand 32 bits 32-bit ALU64 bits32 bits32-bit ALU“HI”Shift RightWriteControl Product (Multiplier) Divisor “LO”Shift LeftWriteRemainder (Quotient) Control 64 bits 79UC Davis EEC 170, Winter 2021 / © John Owens SRT DivisionD. Sweeney of IBM, J.E. Robertson of the University of Illinois, and T.D. Tocher of Imperial College, London CurrentRemainderP-D (Partial Divisor) Plot9|9 43211111 8|8 42211110 7|7 32111100 6|6 32111000 5 | 5 2 1 1 1 0 00 0 4 | 4 2 1 1 0 0 00 0 3|3 11000000 2|2 10000000 1|1 00000000 0 +——————0 1 2 3 4 5 6 78 9 80 UC Davis EEC 170, Winter 2021 / © John Owens SRT Division▪ Intel Pentium divide implementation: SRT division with 2 bits/iteration (radix 4) ▪ Allows negative entries▪ 1066 entries in lookup table81UC Davis EEC 170, Winter 2021 / © John Owens[http://members.cox.net/srice1/ pentbug/introduction.html] Faster Division▪ Can’t use parallel hardware as in multiplier- Subtraction is conditional on sign of remainder▪ Faster dividers (e.g. SRT division) generate multiple quotient bits per step- Still require multiple steps82 UC Davis EEC 170, Winter 2021 / © John Owens Division Lessons▪ In practice, slower than multiplication- Also less frequent- But, in the simple case, can use same hardware!▪ Generates quotient and remainder together▪ Floating-point division faster than integer division (why?) ▪ Similar hardware lessons as multiplier:- Look for unused hardware- Can process multiple bits at once at cost of extra hardware83 UC Davis EEC 170, Winter 2021 / © John Owens RISC-V logical instructions Instruction Meaning Pseudocode XORI rd,rs1,imm SRAI rd,rs1,immExclusive Or Immediaterd ← ux(rs1) ⊕ ux(imm) ORI rd,rs1,imm Or Immediaterd ← ux(rs1) ∨ ux(imm) ANDI rd,rs1,imm SLLI rd,rs1,immAnd Immediaterd ← ux(rs1) ∧ ux(imm)Shift Left Logical Immediaterd ← ux(rs1) « ux(imm) SRLI rd,rs1,imm Shift Right Logical Immediaterd ← ux(rs1) » ux(imm)SLL rd,rs1,rs2SRL rd,rs1,rs2Shift Right Arithmetic Immediaterd ← sx(rs1) » ux(imm)Shift Left Logicalrd ← ux(rs1) « rs2 XOR rd,rs1,rs2 Exclusive Orrd ← ux(rs1) ⊕ ux(rs2)Shift Right Logicalrd ← ux(rs1) » rs2 SRA rd,rs1,rs2 OR rd,rs1,rs2Shift Right Arithmeticrd ← sx(rs1) » rs2Orrd ← ux(rs1) ∨ ux(rs2) AND rd,rs1,rs2 Andrd ← ux(rs1) ∧ ux(rs2)84 UC Davis EEC 170, Winter 2021 / © John Owens Manipulating Bits with Shift, Or, And▪▪ ▪ ▪ ▪▪Mask out exponent with andS EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM& 0 11111111 00000000000000000000000————————————0 EEEEEEEE 00000000000000000000000Right shift 23 bits to get000000000000000000000000 EEEEEEEEDo arithmetic manipulation000000000000000000000000 ENEWENEWLeft shift 23 bits to get0 ENEWENEW 00000000000000000000000Zero out old exponentS EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM& 1 00000000 11111111111111111111111————————————S 00000000 MMMMMMMMMMMMMMMMMMMMMMMOr in new exponentS 00000000 MMMMMMMMMMMMMMMMMMMMMMM| 0 ENEWENEW 00000000000000000000000————————————S ENEWENEW MMMMMMMMMMMMMMMMMMMMMMM85 UC Davis EEC 170, Winter 2021 / © John Owens Shift Operations▪ Arithmetic operation:- Example: 00011 << 2 [3 left shift 2]- 00011 << 2 = 01100 = 12 = 2 * 4- Each bit shifted left == multiply by two- Example: 01010 >> 1 [10 right shift 1]
– 01010 >> 1 = 00101 = 5 = 10/2
– Each bit shifted right == divide by two
– Why?
– Compilers do this—“strength reduction”
86 UC Davis EEC 170, Winter 2021 / © John Owens

Shift Operations


With left shift, what do we shift in?
– 00011 << 2 = 01100 (arithmetic)- 0000XXXX << 4 = XXXX0000 (logical)- We shifted in zeroesHow about right shift?- XXXX0000 >> 4 = 0000XXXX (logical) – Shifted in zero
– 00110 (= 6) >> 1 = 00011 (3) (arithmetic) – Shifted in zero
– 11110 (= -2) >> 1 = 11111 (-1) (arithmetic) – Shifted in one
87 UC Davis EEC 170, Winter 2021 / © John Owens

Shift Operations
▪ How about right shift?
– XXXX0000 >> 4 = 0000XXXX: Logical shift
– Shifted in zero
– 00110 (= 6) >> 1 = 00011 (3)
11110 (= -2) >> 1 = 11111 (-1): Arithmetic shift
– Shifted in sign bit
▪ RISC-V supports both logical and arithmetic:
– slli, srai, srli: Shift amount taken from within instruction (“imm”)
6 bits 6 bits 5 bits 3 bits 5 bits
7 bits 5 bits 5 bits 3 bits 5 bits
– sll, sra, srl: shift amount taken from register (“variable”)
– How far can we shift with slli/srai/slli? With sll/sra/srl?
7 bits
7 bits
funct6
shamt
rs1
funct3
rd
opcode
funct7
rs2
rs1
funct3
rd
opcode
88
UC Davis EEC 170, Winter 2021 / © John Owens

Combinational Shifter from MUXes
Basic Building Block
AB sel
D
8-bit right shifter
A7 A6 A5 A4 A3 A2 A1 A0
S2 S1 S0
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
▪ What comes in the MSBs?
▪ How many levels for 64-bit shifter? ▪ What if we use 4-1 Muxes ?
R7 R6 R5 R4 R3 R2 R1 R0
89 UC Davis EEC 170, Winter 2021 / © John Owens
10

General Shift Right Scheme using 16 bit example
S0 (0,1)
S1 (0,2)
S2 (0,4)
S3 (0,8)
▪ If we added right-to-left connections, we could supportROTATE (not in RISC-V but found in other ISAs)
90 UC Davis EEC 170, Winter 2021 / © John Owens

Funnel Shifter ▪ Shift A by i bits
▪ Problem: Set Y, X, sa ▪ Logical:
▪ Arithmetic: ▪ Rotate:
▪ Left shifts:
Extract 64 bits of 128 (sa selects which bits)
| sa |
Y
X
Y
64
X
64
R
Shift Right
91
UC Davis EEC 170, Winter 2021 / © John Owens
64 R

Barrel Shifter
▪ Technology-dependent solutions: transistor per switch
in6
in5
in4
out3
out2
out1
out0
SR3
SR2 SR1 SR0
in3
in2 in1 in0
92
UC Davis EEC 170, Winter 2021 / © John Owens

Shifter Summary
▪ Shifts common in logical ops, also in arithmetic ▪ RISC-V has:
– 2 flavors of shift: logical and arithmetic
– 2 directions of shift: right and left
– 2 sources for shift amount: immediate, variable
▪ Lots of cool shift algorithms, but …
– Barrel shifter prevalent in today’s hardware
93 UC Davis EEC 170, Winter 2021 / © John Owens

Floating Point
▪ Representation for non-integral numbers
– Including very small and very large numbers
▪ Like scientific notation
– –2.34 × 1056
– +0.002 × 10–4
– +987.02 × 109 ▪ In binary
– ±1.xxxxxxx2 × 2yyyy
▪ Typesfloatanddoublein C
normalized
not normalized
94
UC Davis EEC 170, Winter 2021 / © John Owens
§3.5 Floating Point

Floating Point Standard
▪ Defined by IEEE Std 754-1985
▪ Developed in response to divergence of representations
– Portability issues for scientific code ▪ Now almost universally adopted
▪ Two representations
– Single precision (32-bit)
– Double precision (64-bit)
95 UC Davis EEC 170, Winter 2021 / © John Owens

96 UC Davis EEC 170, Winter 2021 / © John Owens

Floating-point Formats
▪ Single-precision (32 bits)
▪ Double precision (64 bits)
97 UC Davis EEC 170, Winter 2021 / © John Owens

IEEE Floating-Point Format
x = (−1)S ×(1+Fraction)×2(Exponent−Bias)
▪ S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)
▪ Normalize significand: 1.0 ≤ |significand| < 2.0- Always has a leading pre-binary-point 1 bit, so no need torepresent it explicitly (hidden bit)- Significand is Fraction with the “1.” restoredFraction: single: 23 bits double: 52 bits▪ Exponent: excess representation: actual exponent + Bias- Ensures exponent is unsigned- Single: Bias = 127; Double: Bias = 102398 UC Davis EEC 170, Winter 2021 / © John OwensExponent: single: 8 bits double: 11 bits Single-Precision Range▪ Exponents 00000000 and 11111111 reserved▪ Smallest value- Exponent: 00000001⇒ actual exponent = 1 – 127 = –126- Fraction: 000…00 ⇒ significand = 1.0- ±1.0 × 2–126 ≈ ±1.2 × 10–38▪ Largest value- exponent: 11111110⇒ actual exponent = 254 – 127 = +127- Fraction: 111…11 ⇒ significand ≈ 2.0- ±2.0 × 2+127 ≈ ±3.4 × 10+3899UC Davis EEC 170, Winter 2021 / © John Owens Double-Precision Range▪ Exponents 0000…00 and 1111…11 reserved▪ Smallest value- Exponent: 00000000001⇒ actual exponent = 1 – 1023 = –1022- Fraction: 000…00 ⇒ significand = 1.0- ±1.0 × 2–1022 ≈ ±2.2 × 10–308▪ Largest value- Exponent: 11111111110⇒ actual exponent = 2046 – 1023 = +1023- Fraction: 111…11 ⇒ significand ≈ 2.0- ±2.0 × 2+1023 ≈ ±1.8 × 10+308100UC Davis EEC 170, Winter 2021 / © John Owens Floating-Point Precision▪ Relative precision- all fraction bits are significant- Single: approx 2–23- Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits ofprecision- Double: approx 2–52- Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision101UC Davis EEC 170, Winter 2021 / © John Owens Floating-Point Example▪ Represent –0.75 = –(0.5 + 0.25) = –(1.0 + 0.5) / 2- -–0.75 = (–1)1 × 1.12 × 2–1 – S =1Fraction = 1000…002 – Exponent = –1 + Bias- Single: –1 + 127 = 126 = 011111102- Double: –1 + 1023 = 1022 = 011111111102▪ Single: 1011111101000…00▪ Double: 1011111111101000…00102UC Davis EEC 170, Winter 2021 / © John Owens Floating-Point Example▪ What number is represented by the single-precision float 11000000101000…00- S =1–▪Fraction = 01000…002Fxponent = 100000012 = 129 x = (–1)1 × (1 + .012) × 2(129 – 127)= (–1) × 1.25 × 22 = –5.0103UC Davis EEC 170, Winter 2021 / © John Owens Denormal Numbers▪ Exponent = 000…0 ⇒ hidden bit is 0x=(−1)S ×(0+Fraction)×2−Bias▪ Smaller than normal numbers- allow for gradual underflow, with diminishing precision▪ Denormal with fraction = 000…0x=(−1)S ×(0+0)×2−Bias =±0.0Two representations of 0.0!104UC Davis EEC 170, Winter 2021 / © John Owens Infinities and NaNs▪ Exponent = 111…1, Fraction = 000…0- ±Infinity- Can be used in subsequent calculations, avoiding need for overflow check▪ Exponent = 111…1, Fraction ≠ 000…0- Not-a-Number (NaN)- Indicates illegal or undefined result- e.g., 0.0 / 0.0- Can be used in subsequent calculations105UC Davis EEC 170, Winter 2021 / © John Owens Half-precision Floating Pointhttps://en.wikipedia.org/wiki/Bfloat16_floating-point_format 106 UC Davis EEC 170, Winter 2021 / © John Owens

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 computer architecture compiler scheme mips RISC-V algorithm Lecture 6:
30 $