[SOLVED] CS computer architecture compiler jvm c++ assembler concurrency Java RISC-V assembly x86 algorithm cache mips Lecture 4: Instructions: Language of

$25

File Name: CS_computer_architecture_compiler_jvm_c++_assembler_concurrency_Java_RISC-V_assembly_x86_algorithm_cache_mips_Lecture_4:_Instructions:_Language_of.zip
File Size: 1375.32 KB

5/5 - (1 vote)

Lecture 4: Instructions: Language of
the Computer (3/3)
John Owens
Introduction to Computer Architecture UC Davis EEC 170, Winter 2021

From last time
What instructions look like
RISC-V: 32 bit instructions, dierent types (R, I, S, and more)
RISC-V: Instructions either compute something or move something to/from memory
Last lecture: logical, branch, jump instructions Calling procedures
Arguments and return values
Jump instructions and return addresses
Saving registers and RISC-V register conventions
The stack, and memory regions
69 UC Davis EEC 170, Winter 2021 / John Owens

Character Data
Byte-encoded character sets ASCII: 128 characters
95 graphic, 33 control Latin-1: 256 characters
ASCII, +96 more graphic characters Unicode: 32-bit character set
Used in Java, C++ wide characters,
Most of the worlds alphabets, plus symbols
UTF-8, UTF-16: variable-length encodings
70
UC Davis EEC 170, Winter 2021 / John Owens
2.9 Communicating with People

Byte/Halfword/Word Operations

RISC-V byte/halfword/word load/store
Load byte/halfword/word: Sign extend to 64 bits in rd
-lb rd, offset(rs1) -lh rd, offset(rs1) -lw rd, offset(rs1)
Load byte/halfword/word unsigned: Zero extend to 64 bits in rd
-lbu rd, offset(rs1) -lhu rd, offset(rs1) -lwu rd, offset(rs1)
Store byte/halfword/word: Store rightmost 8/16/32 bits
-sb rs2, offset(rs1) -sh rs2, offset(rs1) -sw rs2, offset(rs1)
71
UC Davis EEC 170, Winter 2021 / John Owens

String Copy Example C code:
Null-terminated string
void strcpy (char x[], char y[]) {
size_t i;
i = 0;
while ((x[i]=y[i]) != )
i += 1; }
// C idiom: while (*x++ = *y++);
72
UC Davis EEC 170, Winter 2021 / John Owens

String Copy Example RISC-V code:
x19: save, use as temporary
x6, x7: dont save, also temporaries arguments: x10 = &y, x11 = &x
no return value
strcpy:
addi sp,sp,-8
sd x19,0(sp)
add x19,x0,x0 // i=0 (x19 contains i)
lbu x6,0(x5) add x7,x19,x11 sb x6,0(x7) beq x6,x0,L2 addi x19,x19,1 jal x0,L1
L2: ld x19,0(sp)
addi sp,sp,8
jalr x0,0(x1)
// x6 = y[i]
// x11 = &x x7 = addr of x[i]
// x[i] = y[i]
// if y[i] == 0 then exit
// i = i + 1
// next iteration of loop
// restore saved x19
// pop 1 doubleword from stack
// and return
// adjust stack for 1 doubleword
// push x19
L1: add x5,x19,x10 // x10 = &y x5 = addr of y[i]
73
UC Davis EEC 170, Winter 2021 / John Owens

32-bit Constants
Most constants are small
12-bit immediate is sucient
For the occasional 32-bit constant lui rd, constant
Copies 20-bit constant to bits [31:12] of rd
Extends bit 31 to bits [63:32]
Clears bits [11:0] of rd to 0 lui x19, 976 // 0x003D0
addi x19,x19,1280// 0x500
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0011 1101 0000
0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
74
UC Davis EEC 170, Winter 2021 / John Owens
0000 0000 0011 1101 0000
0101 0000 0000
2.10 RISC-V Addressing for Wide Immediates and Addresses

Branch Addressing
Branch instructions specify
Opcode, two registers, target address
Most branch targets are near branch Forward or backward
SB format: imm[12]
imm[11]
imm [10:5]
rs2
rs1
funct3
imm [4:1]
opcode
The address uses an unusual encoding, which simpli#es data path design but complicates assembly.
PC-relative addressing
Target address = PC + immediate $ 2
Why 2? The RISC-V architects wanted to support the possibility of instructions that are 2 bytes long.
75
UC Davis EEC 170, Winter 2021 / John Owens

Jump Addressing

Use jal x0, Label to jump (goto) to Label UJ format:
Jump and link (jal) target uses 20-bit immediate for larger range Also uses PC-relative addressing

(unconditional jump)
imm[10:1]
imm[19:12]
rd
opcode

For long jumps, eg, to 32-bit absolute address
lui: load address[31:12] to temp register
jalr: add address[11:0] and jump to target
imm[20] imm[11]
5 bits
7 bits
76
UC Davis EEC 170, Winter 2021 / John Owens

RISC-V Addressing Summary
77
UC Davis EEC 170, Winter 2021 / John Owens

RISC-V Encoding Summary
78
UC Davis EEC 170, Winter 2021 / John Owens

Synchronization
Two processors sharing an area of memory
P1 writes, then P2 reads
Data race if P1 and P2 dont synchronize
Result depends of order of accesses Example (next slide):
load balance from memory to register
add $1 to register value
store balance from register to memory
79
UC Davis EEC 170, Winter 2021 / John Owens
2.11 Parallelism and Instructions: Synchronization

Synchronization example
Suppose two cash machines, A and B, are both working on a deposit at the same time. Heres how
the deposit() step typically breaks down into low-level processor instructions:
get balance (balance=0)
add 1
write back the result (balance=1)
When A and B are running concurrently, these low-level instructions interleave with each other
(some might even be simultaneous in some sense, but lets just worry about interleaving for now):
A get balance (balance=0)
A add 1
A write back the result (balance=1)
A get balance (balance=0)
A add 1
A write back the result (balance=1)
B get balance (balance=1)
B add 1
B write back the result (balance=2)
This interleaving is fine we end up with balance 2, so both A and B successfully put in a dollar.
But what if the interleaving looked like this:
B get balance (balance=0)
B add 1
B write back the result (balance=1)
http://web.mit.edu/6.005/www/fa14/classes/17-concurrency/#interleaving 80 UC Davis EEC 170, Winter 2021 / John Owens

Synchronization
Two processors sharing an area of memory
P1 writes, then P2 reads
Data race if P1 and P2 dont synchronize
Result depends of order of accesses Hardware support required
Atomic read/write memory operation
No other access to the location allowed between the read and write
Could be a single instruction

Or an atomic pair of instructions
E.g., atomic swap of register memory 81
UC Davis EEC 170, Winter 2021 / John Owens

Synchronization in RISC-V
Load reserved: lr.d rd,(rs1)
Load from address in rs1 to rd
Place reservation on memory address
Store conditional: sc.d rd,(rs1),rs2
Store from rs2 (the value to be stored) to address in rs1

Fails if location is changed
Returns non-zero value in rd
Succeeds if location not changed since the lr.d Returns 0 in rd
82
UC Davis EEC 170, Winter 2021 / John Owens

Synchronization in RISC-V

Example 1: atomic swap (to test/set lock variable)
again: lr.d x10,(x20)
sc.d x11,(x20),x23 // X11 = status
bne x11,x0,again // branch if store failed

addi x23,x10,0
Example 2: lock
again:

// locked == 1
// read lock
// X23 = loaded value
// X23 and Mem[x20] have swapped
addi x12,x0,1
lr.d x10,(x20)
bne x10,x0,again // check if it is 0 yet
sc.d x11,(x20),x12 // attempt to store locked == 1 bne x11,x0,again // branch if fails
Unlock:
sd x0,0(x20) // free lock
83
UC Davis EEC 170, Winter 2021 / John Owens

Translation and Startup
Many compilers produce object modules directly
84
UC Davis EEC 170, Winter 2021 / John Owens
Static linking
2.12 Translating and Starting a Program

Assembler tasks
Translate assembly instructions into binary
Do stu that makes assembly writers job easier

Pseudoinstructions:
Translate labels to osets (beq a1, a2, Label)

not in instruction set

If its small enough, assembler generates addi If its bigger, lui then addi
mv is a copy instruction (not in instruction set)
li is load immediate (load a number into a register),
85 UC Davis EEC 170, Winter 2021 / John Owens

Producing an Object Module
Assembler (or compiler) translates program into machine instructions
Provides information for building a complete program from the pieces (following is Unix):
Header: describes contents of object module
Text segment: machine code
Static data segment: data allocated for the life of the program
Relocation info: which instructions/data words depend on absolute addresses in this program?
Address space layout randomization (e.g.) requires this
Symbol table: labels that are not de#ned (external references)
Debug info: for associating with source code
86 UC Davis EEC 170, Winter 2021 / John Owens

Linking Object Modules

Could leave location dependencies for #xing by a relocating loader
But with virtual memory, no need to do this
Program can be loaded into absolute location in virtual memory space

Nice example in the book (p. 128)
87 UC Davis EEC 170, Winter 2021 / John Owens
Much faster to link than recompile Produces an executable image
1. Merges segments
2. Resolve labels (determine their addresses)
3. Patch location-dependent and external refs

Loading a Program
Load from image #le on disk into memory
1. 2. 3.
4. 5. 6.
Read header to determine segment sizes Create virtual address space
Copy text and initialized data into memory
Or set page table entries so they can be faulted in Set up arguments on stack
Initialize registers (including sp, fp, gp)
Jump to startup routine
Copies arguments to x10, and calls main
When main returns, do exit syscall
88 UC Davis EEC 170, Winter 2021 / John Owens

Dynamic Linking
Only link/load library procedure when it is called
Requires procedure code to be relocatable
Avoids image bloat caused by static linking of all (transitively) referenced libraries
Automatically picks up new library versions
89 UC Davis EEC 170, Winter 2021 / John Owens

Eect of Compiler Optimization Compiled with gcc for Pentium 4 under Linux
2.6 1.95 1.3 0.65
120000
90000
60000
30000
Relative Performance
Instruction count
00
none O1
O2 O3
none O1 160000
120000
80000
40000
0 none
O2 O3
Clock Cycles
CPI
O1
O2 O3
1.8 1.35 0.9 0.45 0
none O1
O2 O3
90
UC Davis EEC 170, Winter 2021 / John Owens

Eect of Language and Algorithm
3 2.25 1.5 0.75 0
2 1.5 1 0.5 0
3000
2250
1500
750 0
C/none C/O1 C/O2
C/O3 Java/int
Java/JIT
Quicksort Relative Performance
C/none C/O1 C/O2
C/O3 Java/int
Java/JIT
Bubblesort Relative Performance
Quicksort vs. Bubblesort Speedup
C/none C/O1 C/O2
C/O3 Java/int
Java/JIT
91
UC Davis EEC 170, Winter 2021 / John Owens

Lessons Learnt
Instruction count and CPI are not good performance indicators in isolation
Compiler optimizations are sensitive to the algorithm
Java/JIT compiled code is signi#cantly faster than JVM
interpreted
Comparable to optimized C in some cases
Nothing can #x a dumb algorithm!
92 UC Davis EEC 170, Winter 2021 / John Owens

MIPS Instructions
MIPS: commercial predecessor to RISC-V Similar basic set of instructions
32-bit instructions
32 general purpose registers, register 0 is always 0
32 %oating-point registers
Memory accessed only by load/store instructions
Consistent use of addressing modes for all data sizes Dierent conditional branches
For <, <=, >, >=
RISC-V: blt, bge, bltu, bgeu
MIPS: slt, sltu (set less than, result is 0 or 1)
Then use beq, bne to complete the branch
96 UC Davis EEC 170, Winter 2021 / John Owens
2.16 Real Stu!: MIPS Instructions

Instruction Encoding: RISC-V vs. MIPS
97
UC Davis EEC 170, Winter 2021 / John Owens

The Intel x86 ISA
Evolution with backward compatibility 8080 (1974): 8-bit microprocessor
Accumulator, plus 3 index-register pairs 8086 (1978): 16-bit extension to 8080
Complex instruction set (CISC)
8087 (1980): %oating-point coprocessor
Adds FP instructions and register stack 80286 (1982): 24-bit addresses, MMU
Segmented memory mapping and protection 80386 (1985): 32-bit extension (now IA-32)
Additional addressing modes and operations
Paged memory mapping as well as segments
98 UC Davis EEC 170, Winter 2021 / John Owens
2.17 Real Stu!: x86 Instructions

The Intel x86 ISA
Further evolution
i486 (1989): pipelined, on-chip caches and FPU
Compatible competitors: AMD, Cyrix,
Pentium (1993): superscalar, 64-bit datapath
Later versions added MMX (Multi-Media eXtension) instructions
The infamous FDIV bug
Pentium Pro (1995), Pentium II (1997)
New microarchitecture (see Colwell, The Pentium Chronicles) Pentium III (1999)
Added SSE (Streaming SIMD Extensions) and associated registers Pentium 4 (2001)
New microarchitecture
Added SSE2 instructions
99 UC Davis EEC 170, Winter 2021 / John Owens

The Intel x86 ISA

AVX: Advanced Vector Extension (announced 2008)
Longer SSE registers, more instructions
AVX-512 proposed 2013, implemented in Skylake (x86) 2017
And further
AMD64 (2003): extended architecture to 64 bits
EM64T Extended Memory 64 Technology (2004)
AMD64 adopted by Intel (with re#nements)
Added SSE3 instructions
Intel Core (2006)
Added SSE4 instructions, virtual machine support
AMD64 (announced 2007): SSE5 instructions Intel declined to follow, instead
If Intel didnt extend with compatibility, its competitors would! Technical elegance & market success
100
UC Davis EEC 170, Winter 2021 / John Owens

Basic x86 Registers
101
UC Davis EEC 170, Winter 2021 / John Owens

Basic x86 Addressing Modes Two operands per instruction
Source/dest operand
Second source operand
Register
Register
Register
Immediate
Register
Memory
Memory
Register
Memory
Immediate
Memory addressing modes
Address in register
Address = Rbase + displacement
Address = Rbase + 2scale $ Rindex (scale = 0, 1, 2, or 3) Address = Rbase + 2scale $ Rindex + displacement
102
UC Davis EEC 170, Winter 2021 / John Owens

x86 Instruction Encoding Variable length encoding
Post#x bytes specify addressing mode
Pre#x bytes modify operation
Operand length, repetition, locking,
103
UC Davis EEC 170, Winter 2021 / John Owens

Implementing IA-32
Complex instruction set makes implementation dicult
Hardware translates instructions to simpler microoperations
Simple instructions: 11
Complex instructions: 1many
Microengine similar to RISC
Market share makes this economically viable
Comparable performance to RISC
Compilers avoid complex instructions
104 UC Davis EEC 170, Winter 2021 / John Owens

Other RISC-V Instructions
Base integer instructions (RV64I)
Those previously described, plus
auipc rd, immed // rd = (imm<<12) + pc- follow by jalr (adds 12-bit immed) for long jump- slt, sltu, slti, sltui: set less than (like MIPS)- addw, subw, addiw: 32-bit add/sub- sllw, srlw, srlw, slliw, srliw, sraiw: 32-bit shift 32-bit variant: RV32I- registers are 32-bits wide, 32-bit operations105 UC Davis EEC 170, Winter 2021 / John Owens2.18 The Rest of the RISC-V Instruction Set Instruction Set Extensions M: integer multiply, divide, remainder A: atomic memory operations F: single-precision %oating point D: double-precision %oating point C: compressed instructions- 16-bit encoding for frequently used instructions106 UC Davis EEC 170, Winter 2021 / John OwensFallacies Powerful instruction higher performance- Fewer instructions required- But complex instructions are hard to implement- May slow down all instructions, including simple ones- Compilers are good at making fast code from simple instructions Use assembly code for high performance- But modern compilers are better at dealing with modernprocessorsMore lines of code more errors and less productivity-107 UC Davis EEC 170, Winter 2021 / John Owens2.19 Fallacies and Pitfalls Fallacies Backward compatibility instruction set doesnt change – But they do accrue more instructions108UC Davis EEC 170, Winter 2021 / John Owensx86 instruction set Pitfalls Sequential words are not at sequential addresses- MIPS-V addresses are byte addresses- Increment by 4 or 8, not by 1! Keeping a pointer to an automatic variable after procedure returns- e.g., passing pointer back via an argument- Pointer becomes invalid when stack popped109 UC Davis EEC 170, Winter 2021 / John Owens Concluding Remarks Design principles- 1. Simplicity favors regularity- 2. Smaller is faster- 3. Good design demands good compromises Make the common case fast Layers of software/hardware- Compiler, assembler, hardware RISC-V: typical of RISC ISAs- c.f. x86110 UC Davis EEC 170, Winter 2021 / John Owens2.20 Concluding Remarks We likely dont have time for the next few slides Great example though!111 UC Davis EEC 170, Winter 2021 / John OwensC Sort Example Illustrates use of assembly instructions for a C bubble sort function Swap procedure (leaf)void swap(long long int v[],long long int k) { long long int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;}- v in x10, k in x11, temp in x5 112UC Davis EEC 170, Winter 2021 / John Owens2.13 A C Sort Example to Put It All Together The Procedure Swapswap:slli x6,x11,3addx6,x10,x6ld x5,0(x6)ld x7,8(x6)sd x7,0(x6)sd x5,8(x6)jalr x0,0(x1)// reg x6 = k * 8// reg x6 = v + (k * 8)// reg x5 (temp) = v[k]// reg x7 = v[k + 1]// v[k] = reg x7// v[k+1] = reg x5 (temp)// return to calling routine113UC Davis EEC 170, Winter 2021 / John Owens The Sort Procedure in C Non-leaf (calls swap) void sort (long long int v[], size_t n) {size_t i, j;for (i = 0; i < n; i += 1) {for (j = i 1; j >= 0 && v[j] > v[j + 1];
j -= 1) {
swap(v,j); }
}
}
v in x10, n in x11, i in x19, j in x20 114
UC Davis EEC 170, Winter 2021 / John Owens

The Outer Loop Skeleton of outer loop:
for (i = 0; i = 0 && v[j] > v[j + 1]; j = 1) {
addi x20,x19,-1// j = i 1
for2tst:
bltx20,x0,exit2// go to exit2 if X20 < 0 (j < 0)slli x5,x20,3addx5,x10,x5ld x6,0(x5)ld x7,8(x5)blex6,x7,exit2mv x21, x10mv x22, x11mv x10, x21mv x11, x20 jalx1,swap addi x20,x20,-1 jfor2tstexit2:// reg x5 = j * 8// reg x5 = v + (j * 8)// reg x6 = v[j]// reg x7 = v[j + 1]// go to exit2 if x6 x7// copy parameter x10 into x21// copy parameter x11 into x22// first swap parameter is v// second swap parameter is j// call swap// j = 1// branch to test of inner loop116UC Davis EEC 170, Winter 2021 / John Owens Preserving RegistersPreserve saved registers:addi sp,sp,-40// make room on stack for 5 regssd x1,32(sp)// save x1 on stacksd x22,24(sp) // save x22 on stacksd x21,16(sp) // save x21 on stacksd x20,8(sp) // save x20 on stack sd x19,0(sp) // save x19 on stackRestore saved registers:exit1:sd x19,0(sp)// restore x19 from stacksd x20,8(sp)// restore x20 from stacksd x21,16(sp) // restore x21 from stacksd x22,24(sp) // restore x22 from stacksd x1,32(sp)// restore x1 from stackaddi sp,sp, 40// restore stack pointerjalr x0,0(x1)117UC Davis EEC 170, Winter 2021 / John Owens

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS computer architecture compiler jvm c++ assembler concurrency Java RISC-V assembly x86 algorithm cache mips Lecture 4: Instructions: Language of
$25