ECE437: Introduction to Digital Computer Design
Chapter 2 Instructions Spring 2021
Announcements
Lab1 is on basic 270/337 material and is not covered in 437 lectures
Homeworks given out every Friday, due the next Friday at the start of the lecture
Slidesandhomeworkspostedonthecoursewebsite Slides will be posted before each lecture so you
can print before the lecture for taking notes
Please speak your questions/answers Unmute, speak, mute (Chat is no good)
ECE437,S21 Vijaykumar and Thottethodi (2) 1/27/2021
Instructions
Instructions are the words of a computer
Instruction set architecture (ISA) is its vocabulary
With a few other things, this defines the interface of computers
But implementations vary
We will study MIPSTM ISA
but the next ISA is not too hard after the first
We wont write much assembly code ECE362
ECE437,S21 Vijaykumar and Thottethodi (3) 1/27/2021
Computer What does it do?
Fetch instruction from address in Program Counter (PC)
Execute instruction
Update PC to point to next instruction
ECE437,S21 Vijaykumar and Thottethodi
Fetch [PC]
Execute
(4)
Update PC
1/27/2021
1
Outline
Registers and ALU ops Memory and load/store Branches and jumps
And more . . .
Read Chapter 2 and skim through Appendix E (on CD) for MIPS ISA
ECE437,S21 Vijaykumar and Thottethodi (5) 1/27/2021
Instructions Basics
Basics
C statement
f=(g+h)-(i+j)
MIPS instructions*
addt0,g,h //t0=g+h addt1,i,j // t1=i+j subf,t0,t1 // f =t0t1
Opcodes/mnemonic, operands, source/destination
ECE437,S21 Vijaykumar and Thottethodi (6) 1/27/2021
Basics
Opcode: Specifies the kind of operation (mnemonic)
Operands: input and output data (source/destination)
Operands t0, t1 are temps
One operation, two inputs, one output
Multiple instructions for one C statement
ECE437,S21 Vijaykumar and Thottethodi (7) 1/27/2021
Why not have bigger instructions?
Whynothavef=(g+h)-(i+j)asoneinstruction?
Church-Turingthesis:Averyprimitivecomputercan
compute anything that a fancy computer can compute need only logical functions, read and write memory and data-
dependent decisions
Therefore,ISAselectedforpracticalreasons: performance and cost, not computability
Simplicity/regularitytendtoimproveboth
E.g.,H/Wtohandlearbitrarynumberofoperands
complex, slow and rarely usable and NOT NECESSARY
ECE437,S21 Vijaykumar and Thottethodi (8) 1/27/2021
2
Registers and ALU ops
Ok,Ilied!
Operandsmustberegisters,notvariables
add $8, $17, $18 // $8 <- $17 + $18 destination LEFT MOST if you confuse this, you wont get anything right in 437 add$9,$19,$20 sub$16,$8,$9 MIPShas32registers$0-$31(figurenextslide) $0always0(writingto$0hasnoeffectanoop(NOP)) Registers$8,$9aretemps,$16-f,$17-g,$18-h, MIPSalsoallowsoneconstantcalledimmediate laterwewillseeimmediateis16bits[-32768,32767]ECE437,S’21 Vijaykumar and Thottethodi (9) 1/27/2021$19 – i, $20 j Registers and ALU $0 $1$31ALU ECE437,S’21 Vijaykumar and Thottethodi (10) 1/27/2021 ALU ops Some ALU ops: add, addi, addu, addiu (immediate, unsigned) Caution: casting signed to unsigned in System Verilog may cause weirdness sub… mul, div and, andi or, ori sll,srl,…- weird! Why registers? Specifiers fit in instructions, smaller memory is faster Are registers enough?ECE437,S’21 Vijaykumar and Thottethodi (11) 1/27/2021 Memory and load/store But need more than 32 words of storage Memory – an array of locations M[addr], indexed by addr (figure next slide) Data movement (on words or integers) load word for reg <– memory lw $17, 1002 // get input g store word for reg –> memory
sw $16, 1001 // save output f
Note: destination LAST for stores!
ECE437,S21 Vijaykumar and Thottethodi (12) 1/27/2021
3
Memory and load/store
$0 $1
$31
ALU
ECE437,S21 Vijaykumar and Thottethodi
0 1 2 3
1001 f 1002 g
Maxmem
(13) 1/27/2021
Byte vs. Word addresses
I lied again!
we need to address bytes for characters
words take 4 bytes
therefore, word addresses must be multiples of 4 (i.e., end in 00)
Addresses are ALWAYS byte addresses (next slide)
NOT word addresses (previous slide) figure next slide
ECE437,S21 Vijaykumar and Thottethodi (14) 1/27/2021
Memory and load/store
$0 $1
$31
ALU
ECE437,S21 Vijaykumar and Thottethodi
0 4 8 0xC
0x4004 f 0x4008 g
Maxmem
(15) 1/27/2021
Memory and load/store
Important for arrays
A[i] = A[i] + h (figure next slide)
# $8 temp, $18 h, $21 (i x 4)
Astart is 0x8000, A is an array of words NOT bytes
lw $8, Astart($21) // == 0x8000($21)
// Astart or 0x8000 is offset
add $8, $18, $8
// $8 <- M[Astart+$21]// or $8 <- M[0x8000+$21]sw $8, Astart($21) //offset is signed, can be negative MIPS has other load/store for byte/halfwordECE437,S’21 Vijaykumar and Thottethodi (16) 1/27/20214Memory and load/store$0 $1$31ALUECE437,S’21 Vijaykumar and Thottethodi0 4 8 0xC0x4004 f 0x4008 g0x8000 A[0] 0x8004 A[1] 0x8008 A[2]Maxmem(17) 1/27/2021 Endian Word storage Word = bytes (0x400,0x401,0x402,0x403)? Word = bytes (0x403,0x402,0x401,0x400)? It depends… Big endian: MSB at address xxxxxx00 e.g., IBM, SPARC Little endian: MSB at address xxxxxx11 e.g., Intel x86 Mode selectable e.g., PowerPC, MIPSECE437,S’21 Vijaykumar and Thottethodi (18) 1/27/2021 Branches and JumpsWhile ( i != j) { j= j + i;i= i + 1; } HLL->Assembly,$8isi,$9isj
Loop: beq $8, $9, Exit
// note that the condition is reversed
add $9, $9, $8 addi $8, $8 , 1 j Loop
Exit:
ECE437,S21 Vijaykumar and Thottethodi (19) 1/27/2021
Branches and Jumps
Better yet:
beq $8, $9, Exit
Loop: add $9, $9, $8 addi $8, $8 , 1
bne $8, $9, Loop Exit:
// not !=
Let compilers worry about such optimizations
ECE437,S21 Vijaykumar and Thottethodi (20) 1/27/2021
5
Branches and Jumps
What does bne do really?
read $8; read $9, compare
setPC=PC+4orPC=Target
To do compares other than equal or not equal
e.g., blt $8, $9, Target // assembler PSEUDO-
expands to
slt $1, $8, $9 // set less than $1 = ($8 < $9)?1:0 bne $1, $0, Target // $0 is always 0ECE437,S’21 Vijaykumar and Thottethodi (21) 1/27/2021instruction Branches and Jumps Other MIPS branches/jumps beq $8, $9, imm //if($8==$9)PC=PC+imm<<2elsePC+=4 bne … slt, sle, sgt, sge with immediate, unsigned j addr jr $12 jal addr// PC = addr //PC=$12// $31 = PC+4; PC = addr; used for procedure callsECE437,S’21 Vijaykumar and Thottethodi (22) 1/27/2021 Two-minute quizFill in the blanks Abstraction separates __________ from ___________True or False: A computers instruction set typically decides the subset of programs that the computer can run and the instruction set has to be expanded to run other programsECE437,S’21 Vijaykumar and Thottethodi (23) 1/27/2021 Assembly Exercise for(i=0;i
a = a+b else
b = a+b
ECE437,S21 Vijaykumar and Thottethodi
(25) 1/27/2021
Assembly Exercise
HLL-> assembly if (a=b)
a = a+b else
lw$8,a //a->$8 lw$9,b //b->$9 beq $8,$9,L_IF
j L_ELSE
L_IF:
add $8,$8,$9 // a = a+b sw $8,a // write a to mem j EXIT
b = a+b
L_ELSE:
add $9,$8,$9 // b= a+b
sw $9, b // write b to mem EXIT:
ECE437,S21 Vijaykumar and Thottethodi (26) 1/27/2021
Procedure Calls See section 2.7 for more details
save registers
set up parameters call procedure
get results
restore registers
Proc: save more registers do function
set up results
restore more registers return
jal (call) is the ONLY operation in hardware: the rest is in software
jal addr // $31 = PC+4 (return addr); PC = addr (UNLIKE 362 where return addr is saved on stack too complicated/slow to do in fast clock)
ECE437,S21 Vijaykumar and Thottethodi (27) 1/27/2021
Stack
An important data structure for calls
Stack grows from larger to smaller addresses $29 is the stack pointer, points to top of stack
push $2:
addi $29, $29, -4 sw $2, 4($29)
pop $2:
lw $2, 4($29) addi $29, $29, 4
push, pop NOT real instructions
Addi-sw/lw-addi order cannot change. Why?
sw $2, 0($29)
addi $29, $29, -4
ECE437,S21 Vijaykumar and Thottethodi (28) 1/27/2021
7
Procedure Example
HLL code for swap
{ temp = v[k]; /* code for swap */
v[k] = v[k+1]; v[k+1] = temp; return (temp); }
Corresponding assembly code
swap: addi $29, $29, -8 // save registers
sw $15, 4($29) sw $16, 8($29)
ECE437,S21 Vijaykumar and Thottethodi (29) 1/27/2021
Procedure Example
muli* add lw
lw sw sw mov* lw
ECE437,S21 Vijaykumar and Thottethodi
$2, $5, 4 $2, $4, $2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2)
$2, $15 $15, 4($29) $16, 8($29) $29, $29, 8
// procedure body; $2 <- k*4// $2 <- v + k*4 == address of v[k]// $15 <- v[k] // $16 <- v[k+1] // v[k] <- $16 // v[k+1] <- $15// return value// restore registers// returnlwaddijr Allpush/popins/w$31(30) 1/27/2021 Procedure Call Example Now, calling the procedure HLL codereturn_value = swap(arr_A, i); /* call swap */mov $4, $18 mov $5, $17 jal swap mov $20, $2ECE437,S’21 Vijaykumar and Thottethodi// first parameter // second parameter// copy return value(31) 1/27/2021 Layers of Software Notation – program: input data -> output data executable: input data -> output data
loader: executable file -> executable in memory
linker: object files -> executable file
assembler: assembly file -> object file compiler: HLL file -> assembly file
editor: editor commands -> HLL file
Possible only because programs can be manipulated as data
stored program computer
ECE437,S21 Vijaykumar and Thottethodi (32) 1/27/2021
8
MIPS Machine Language
All 32-bit instructions
Assembly: add $1, $2, $3
14 ASCII (or Unicode) codes 6164642024312C202432 2C202433(hex)
Machine code: 1 word
000000 00010 00011 00001 00000 010000
ECE437,S21 Vijaykumar and Thottethodi (33) 1/27/2021
alu-rr
2
3
1
zero
add/signed
Instruction Format
R-format:
opcode rs rt rd shamt funct
655556
Digression:
Howdoyoustorethenumber4,392,976? Sameasadd$1,$2,$3
Storedprogram:instructionsarerepresentedas numbers
programscanberead/writteninmemorylikenumbers
Other R-format: addu, sub*, and, or .. etc
ECE437,S21 Vijaykumar and Thottethodi (34) 1/27/2021
Instruction Format
Assembly: lw $1, 100($2) Machine:
100011 00010 00001 0000000001100100
lw 2 1 100 (in signed binary)
I-format:
opcode rs rt address/immediate
Addr/immediate is 16 bits signed 2s complement
ECE437,S21 Vijaykumar and Thottethodi (35) 1/27/2021
6
5
5
16 (signed)
Instruction Format
I-format also used for ALU ops with immediates
addi $1, $2, -4
001000 00010 00001 1111111111111100
What about constants larger than 16 bits = [-
e.g., 0000 0000 0000 1100 0000 0000 0000 1111 or 0x000C000F?
lui $4, 0xC // $4 <- 0x000C0000ori $4,$4, 0xF // $4 <- 0x000C000F All loads and stores use I-formatECE437,S’21 Vijaykumar and Thottethodi (36) 1/27/202132768, 32767]?9Instruction Format I-format for branches: beq $1, $2, 7 000100 00001 00010 0000 0000 0000 0111 PC=PC+(00000111<<2) //wordoffset Finally, J-format: opcode addr6 26 addr is weird in J-format: Target address = 4 MSB of PC:addr:00 (4+26+2 = 32) Why MSB of PC? We have 26, we need 30 Can we Pad with 0s? Restricts jumps to top16th of address spaceECE437,S’21 Vijaykumar and Thottethodi (37) 1/27/2021 MIPS: branch vs jump Branch: 16 bit displacement Current word +/- 215 Range is symmetric Range(forward) = Range(backward) = 2150xE000 0000 Jump: 26 lower-bit concatenation in word address Jump anywhere within a 226 segment Range may be asymmetric Range(forward) + Range(backward) = 226ECE437,S’21 Vijaykumar and Thottethodi (38)215215226-X 0xEFFF FFFF1/27/2021X Summary: Instruction Formats R-format: opcode rs rt rd shamt funct I-format: opcode rs rt address/immediate J-format: opcode addr Instr decode Theory: Inst bits -> identify instrs -> control signals
Practice: Instruction bits -> control signals
ECE437,S21 Vijaykumar and Thottethodi (39) 1/27/2021
6
5
5
5
5
6
6
5
5
16
6
26
Addressing modes
There are many ways of accessing data 1. Register addressing
add $1, $2, $3
5
ECE437,S21 Vijaykumar and Thottethodi (40) 1/27/2021
opcode
rs
rt
rd
Register file
shamt
funct
10
Addressing Modes
2. Base addressing (aka displacement)
lw $1, 100($2) // $2 == 400, M[500] == 42
ECE437,S21 Vijaykumar and Thottethodi (41) 1/27/2021
opcode
rs
rt
address (= 100)
5
Register file 400
Memory 42
Addressing Modes
3. Immediate addressing addi $1, $2, 100
ECE437,S21 Vijaykumar and Thottethodi (42) 1/27/2021
opcode
rs
rt
immediate
Addressing Modes
4. PC relative addressing
beq$1,$2,100//if($1==$2)PC=PC+100
ECE437,S21 Vijaykumar and Thottethodi (43) 1/27/2021
opcode
rs
rt
displacement (= 100)
Memory
PC
Addressing Modes
Pseudodirect addressing
j Loop // instruction contains address*
ECE437,S21 Vijaykumar and Thottethodi (44) 1/27/2021
opcode
PC
address
Memory
11
Addressing Modes
NOT found in MIPS
5. Indexed: add two registers base + index
6. Indirect: M[M[addr]] two memory references
7. Autoincrement/decrement: add data size
8. Autoupdate found in IBM PowerPC, HP PA-RISC
like displacement but update register
ECE437,S21 Vijaykumar and Thottethodi (45) 1/27/2021
Addressing Modes
Autoupdate
lwupdate $1, 4[$2] // $1 == M[4+$2]; $2 == $2 + 4
ECE437,S21 Vijaykumar and Thottethodi (46) 1/27/2021
opcode
rs
5
rt
Register file
address
Memory
Delay
Addressing Modes
for (i = 0, I < N, i +=1) sum += A[i]; $7-sum,$8-addressofa[i],$2tmpinner:lw $2, 0($8) addi $8, $8, 4 add $7, $7, $2new inner:lwupdate $2, 4($8)add $7, $7, $2 any problems with new inner ?ECE437,S’21 Vijaykumar and Thottethodi (47) 1/27/2021 How to Choose ISA Minimize what? Two way constraint satisfaction What can be implemented in hardware? From below What ISA is better for optimized compilation? From above In 1985-1990 technology, simple modes like MIPS good Easy to implement, not a real reason in the business world Very compelling reason in the classroom As technology changes, computer design options change For small memory, dense instructions important and dense usually complex/irregular For high speed, pipelining important and for pipelining simple/regular importantECE437,S’21 Vijaykumar and Thottethodi (48) 1/27/202112Intel 80386 ISA 8 32-bit registers Two register machine: src1/dst src2 reg-reg,reg-immed,reg-mem,mem-reg,mem – imm seven addressing modes 16-bit and 32-bit ops on memory and registers 8-bit prefix to override default data size condition codes part of normal operation, extends operation time, often not looked atECE437,S’21 Vijaykumar and Thottethodi (49) 1/27/2021 Intel 80386 ISA Decoding nightmare instructions 1 to 17 bytes prefixes, postfixes crazy formats – e.g., register specifiers move around but key instructions not terrible yet have to make ALL work correctlyECE437,S’21 Vijaykumar and Thottethodi (50) 1/27/2021 One-minute quiz In stored-program computers, what is the difference between how code and data are handled (i.e., read from/ written to memory and manipulated)? Previous quiz answers Abstraction separates the interface (OR the what) from the implementation (OR the how) A computers instruction set typically decides the subset of programs that the computer can run and the instruction set has to be expanded to run other programs. FalseECE437,S’21 Vijaykumar and Thottethodi (51) 1/27/2021Current Approach Current Intel chips Instruction decode logic translates into RISCy Ops RISC Reduced Instruction Set Computer Execution unit runs RISCy ops Backward compatibility Complex decoding Execution unit as fast as RISC We work with MIPS to keep it simple Learn x86 on the job!ECE437,S’21 Vijaykumar and Thottethodi (52) 1/27/202113Complex Instructions More powerful instructions not necessarily faster execution E.g. – string copy Option 1: move with repeat prefix for memory to memory move special-purpose Option 2: use loads into register and then stores generic instructions Option 2 faster on the same machine!ECE437,S’21 Vijaykumar and Thottethodi (53) 1/27/2021 OutlineSo far, too MIPS-centric Rich design space:Other possibilities: Stack and Accumulator architectures :Ch 2 (on CD)ECE437,S’21 Vijaykumar and Thottethodi (54) 1/27/2021 General Purpose Architecture General Purpose Register architecture 3 operands: 2 source + 1 dest Depends on addressing modes Register-register only MIPS : all register operands except load/store Memory-memory (all memory operands) Eg f = (g+h) *(i-j)ECE437,S’21 Vijaykumar and Thottethodi (55) 1/27/2021 Add t1, g, h Sub t2, i, j Mul f, t1, t2 Stack Machine Push, Pop with one address Last in, first out Binary/Unary Op operates on Implicit operands Top one/two element(s) of stack Eg f = (g+h) *(i-j)ECE437,S’21 Vijaykumar and Thottethodi (56)1/27/2021 Push g Push h Add Push i Push j Subtract Mul14Accumulator Machine One address 2nd implicit operand: accumulator accumulatoris2nd source operand as well as destination Eg f = (g+h) *(i-j)ECE437,S’21 Vijaykumar and Thottethodi(57) 1/27/2021 Loadg //Acc=g Add h // Acc = Acc+h Storet //t=g+h Loadi //Acc=iSubt j // Acc = Acc-j Mul t1 // Acc = Acc*t Store f Concluding Remarks Simple and regular same length instructions, fields in same place Small and fast Small number of registers Compromises inevitable There is NO PERFECT ISA! Pipelining should not be hindered (more later) Common case fast Read Chapter 2 Read Chapter 4a 4.1-4.4 (coming up next) Ch 4a+b is 1/3rds of this course, labECE437,S’21 Vijaykumar and Thottethodi (58) 1/27/202115
Reviews
There are no reviews yet.