[SOLVED] CS computer architecture mips assembly c/c++ c++ Arithmetic for Computers 2: Floating Point Numbers

$25

File Name: CS_computer_architecture_mips_assembly_c/c++_c++_Arithmetic_for_Computers_2:_Floating_Point_Numbers.zip
File Size: 932.58 KB

5/5 - (1 vote)

Arithmetic for Computers 2: Floating Point Numbers
CS 154: Computer Architecture Lecture #9
Winter 2020
Ziad Matni, Ph.D.
Dept. of Computer Science, UCSB

Administrative
Lab 4 due today! Lab5outsoon
Syllabus (Schedule Section) has been updated 2/5/20 Matni, CS154, Wi20 2

Midterm Exam (Wed. 2/12) Whats on It?
Everything weve done so far from start to Monday, 2/10
What Should I Bring?
Your pencil(s), eraser, MIPS Reference Card (on 1 page)
You can bring 1 sheet of hand-written notes (turn it in with exam). 2 sides ok.
What Else Should I Do?
IMPORTANT: Come to the classroom 5-10 minutes EARLY
If you are late, I may not let you take the exam
IMPORTANT: Use the bathroom before the exam once inside, you cannot leave
Random seat assignments
Bring your UCSB ID
2/5/20 Matni, CS154, Wi20 3

Lecture Outline
Floating Point Numbers Representations IEEE 754 F-P Standard
Arithmetic in F-P
Instructions for F-P
Hardware implementations
2/5/20 Matni, CS154, Wi20 4

Floating Point
Representation for non-integral numbers
Including very small and very large numbers
Usually follows some normalized form
of scientific notation
2/5/20 Matni, CS154, Wi20 5

Floating Point Numbers in CPUs
We need 3 pieces of information to produce a binary floating point number:
+/- N x 2E
The mantissa (aka significand) of the number
The sign of the number (positive or negative)
2/5/20 Matni, CS111, Sp19 6
The exponent of the number

Representation in MIPS (Single Precision)
The actual form is: (-1)S x (1 + Fraction) x Bias
Called the IEEE 754 F-P Standard (more on this coming up)
MIPS design for single-precision has:
8 bits for exponent and 23 bits for fraction
Gives a range from 2.0 x 10-38 to 2.0 x 1038 quite large!
Overflow can occur: here it means that the exponent is too large to be represented in the exponent field.
If a negative exponent is too large, then we get underflow.
2/5/20 Matni, CS154, Wi20 7
2
Exponent

Double Precision Floating Points
Single Precision is float in C/C++
Double Precision is double in C/C++
64 bits (2 words) instead of 32 bits 11 bits for exponent (instead of 8) 52 bits for fraction (instead of 23)
Gives a wider range and greater precision than single-precision
Range is: 2.0 x 10-308 to 2.0 x 10308
2/5/20 Matni, CS154, Wi20 8

IEEE 754 Floating-Point Standard
Includes single and double-precision definitions (since 1980s) Very widespread in almost all CPUs today
S = 0epositive S = 1enegative
The Bias is 127 for single-precision and 1023 for double-precision Examples with single-precision:
The 1 in 1 + Fraction is implicit
S=0, E=0x82, F=0 is: (+1) x (1 + 0) x 2 (130-127) =123 =8
S=0, E=0x83, F=0x600000 is: (+1) x (1 + 0.11) x 2 (131-127)
=1.1124 =11100=28
2/5/20 Usefulwebsite: https://www.h-Msactnhi,mCS1i5d4,tW.ni20et/FloatConverter/IEEE754.html 9

More Examples!
Hex word for single-precision F-P is: 0x3FA00000 So:
0011 1111 1010 0000 0000
S=0 E=0x7F=127 F=0100 So:
Number = (+1) x (1 + 0.01) x 2(127 127) = 1.01 (bin) =1+12-2 =1.25
2/5/20
Matni, CS154, Wi20 10

Yet More Examples!!
Hex word for single-precision F-P is: 0xBF300000 So:
1011 1111 0011 0000 0000
S=1 E=0x7E=126 F=0110 So:
2/5/20
Matni, CS154, Wi20 11
Number
= (-1) x (1 + 0.011) x 2(126 127) = 1.011 (bin) = -(1 + (1 x 2-2) + (1 x 2-3)) x 2-1
= -(1 + 0.25 + 0.125) x 0.5
= -0.6875
2-1 = 0.5
2-2 = 0.25
2-3 = 0.125 2-4 = 0.0625 2-5 = 0.03125

Even More Examples!!!
What is the single-precision word (in hex) of the F-P number 29.125? Ok, here we go:
I am reminded that 0.125 = 2-3
And, I know that 29 in binary is: 11101
So 29.125(10) = 11101.001(2) = 1.1101001 x 24 This is a positive number, so S = 0
F = 11010010000 (23 bits in all)
E = 4 + 127 = 131 = 10000011
So:
Number in bin = 0 10000011 11010010000
or 0100 0001 1110 1001 00 = 0x41E90000
2/5/20
Matni, CS154, Wi20 12
2-1 = 0.5
2-2 = 0.25
2-3 = 0.125 2-4 = 0.0625 2-5 = 0.03125

Special Exponent Values
Consider Single-Precision Numbers:
Exponents 0x00 and 0xFF are reserved
Smallest exponent is 1 e Actual exponent = 1 127 = -126 Smallest fraction is 0
So,Iget1.02-126 1.21038
Largest exponent is 0xFE = 254eActual exp. = 127 Largest fraction is 11111 , which approaches 1 So,Iget2.02+127 3.410+38
2/5/20 Matni, CS154, Wi20 13

Special IEEE 754 Values
IEEE 754 allows for special symbols to represent unusual events WhenS=0, E=0xFF, F=0,
IEEE calls the number inf (i.e. infinity) -infiswhenS=1, E=0xFF, F=0
These are to optionally allow programmers to divide by 0.
Allows for the result of invalid operations
These are called Not a Number or NaN
2/5/20
Example: 0/0 , inf inf, etc
Matni, CS154, Wi20 14

Floating-Point Addition
Consider a 4-digit decimal example:
9.999 x 101 + 1.610 x 101
1.
2. 3. 4.
Align decimal points
Shift number with smaller exponent 9.999 x 101 + 0.016 x 101
Add significands 10.015 x 101
2/5/20
Matni, CS154, Wi20 15
Normalize result & check for over/underflow 1.0015 x 102
Round and renormalize if necessary (what? why? Be patient) 1.002 x 102

Floating-Point Addition
Consider a 4-digit binary example:
1.000 x 2-1 + -1.110 x 22
1.
2. 3. 4.
Align decimal points
Shift number with smaller exponent 1.000 x 2-1 + -0.111 x 2-1
Add significands 0.001 x 2-1
2/5/20
Matni, CS154, Wi20 16
Normalize result & check for over/underflow 1.000 x 2-4
Round and renormalize if necessary 1.000 x 2-4 = 0.0625

Re: Rounding in Binary F-P
Can we create ANY floating point number in binary?
What about 0.3333 (i.e. 1/3)?
In binary, 1/10 is the infinitely repeating fraction 0.0001100110011001100110011001100110011001100
Since we cannot create ALL F-P numbers in binary, rounding (i.e. approximating) is necessary
Many users are not aware of the approximation because of the way values are displayed
The actual stored value is the nearest representable binary fraction
2/5/20 Matni, CS154, Wi20 17

C++ Program to Illustrate Rounding in Binary F-P
#include
#include
int main()
{
// Try running the program without the next 2 lines
// as a comparison. Or change the precision number around.
std::cout << std::setprecision(30);std::cout << std::fixed;float a = 1.0/3;double b = 1.0/3;std::cout << a << ”
” << b << ”
“;float x = 1.0/10;double y = 1.0/10;std::cout << x << ”
” << y;} 2/5/20Matni, CS154, Wi20 18 Floating-Point Adder Hardware Much more complex than integer adder Remember the 4 steps from a couple of slides ago?… Doing it in one clock cycle would take too long Would force a slower clock on the system How much we can do in 1 clock cycle is a matter for later discussion FP adder usually takes several cycles Can be pipelined for more efficient operation 2/5/20 Matni, CS154, Wi20 192/5/20 Matni, CS154, Wi20 20FP Adder HardwareFP Other Arithmetic Hardware FP multiplier is of similar complexity to FP adder But uses a multiplier for significands instead of an adder FP arithmetic hardware (incl. addition) is usually in a co-processor & does: Addition, subtraction, multiplication, division, reciprocal, square-root FPceinteger conversion Operations usually takes several cycles Can be pipelined 2/5/20Matni, CS154, Wi20 21 MIPS FP InstructionsSingle-Precision Double-Precision Addition add.s add.d Subtraction sub.s sub.d Multiplication mul.s mul.d Division div.s div.d Comparisons c.xx.s c.xx.d Where xx can be eq, neq, lt, gt, le, geExample: c.eq.s Load lwc1 lwd1 Store swc1 swd1 Also, F-P branch, true (bc1t) and branch, false (bc1f)2/5/20 Matni, CS154, Wi20 22 MIPS FP Instructions FP instructions operate only on FP registers Programs generally dont do integer ops on FP data,or vice versa More registers with minimal code-size impact 2/5/20 Matni, CS154, Wi20 23 The Floating Point Registers MIPS has 32 separate registers for floating point: $f0, $f1, etc… Paired for double-precision $f0/$f1, $f2/$f3, etc… Example MIPS assembly code:lwc1 $f4, 0($sp) # Load 32b F.P. number into F4lwc1 $f6, 4($sp) # Load 32b F.P. number into F6 add.s $f2, $f4, $f6 # F2 = F4 + F6 single precision swc1 $f2, 8($sp) # Store 32b F.P. number from F2 2/5/20Matni, CS154, Wi20 24 Example CodeC++ code:float f2c (float fahr) {return ((5.0/9.0)*(fahr – 32.0)); }Assume:fahr in $f12, result in $f0, constants in global memory space (i.e. defined in .data)Compiled MIPS code: 2/5/20Matni, CS154, Wi20 25f2c:lwc1 $f16, const5lwc1 $f18, const9div.s $f16, $f16, $f18lwc1 $f18, const32sub.s $f18, $f12, $f18mul.s $f0, $f16, $f18jr $ra YOUR TO-DOs for the Week Readings!Work on Lab 5!Start studying for the midterm! 2/5/20 Matni, CS154, Wi20 26 2/5/20 Matni, CS154, Wi20 27

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS computer architecture mips assembly c/c++ c++ Arithmetic for Computers 2: Floating Point Numbers
$25