2020 Winter Term 2
Unit 1e Procedures and Stack
1
CPSC 213 2020 ST1 2020 Jonatan Schroeder
Reading: Companion: 2.8; Textbook: 3.7, 3.12
Learning Goals
explain when local variables are allocated and freed
distinguish a procedures return address from its return argument
describe why activation frames are allocated on the stack and not on the heap
explain how to use the stack pointer to access local variables and arguments
given an arbitrary C procedure, describe the format of its stack activation frame
explain the role of each of the caller and callee prologues and epilogues
explain the tradeoffs involved in deciding whether to pass arguments on the stack or in registers
describe the necessary conditions for not saving the return address on the stack and the benefit of not doing so
write assembly code for procedure call arguments passed on the stack or in registers, with and without a return value
write assembly code for a procedure with and without local variables, with arguments pass on the stack or in registers, with and without a return value
write assembly code to access a local scalar, local static array, local dynamic array, local static struct, and local dynamic struct
describe how a buffer-overflow, stack-smash attack occurs
CPSC 213 2020 ST1 2020 Jonatan Schroeder 2
void b (int a0, int a1) {
int l0 = 0;
int l1 = 1;
}
Can l0, l1, a0, a1 be allocated statically?
A. Yes,always
B. Yes, but only if b doesnt call itself directly
C. Yes, but only if foo doesnt call any functions
D. No, none of these can be allocated statically at all
CPSC 213 2020 ST1 2020 Jonatan Schroeder 3
How many different versions of n are there? void a (int n) {
if (n == 0) return;
a(n 1);
printf(%d
, n);
}
What if there is no apparent recursion? void b (int n) {
int l = n * n;
c(l 1);
}What if c( ) calls b( )?
CPSC 213 2020 ST1 2020 Jonatan Schroeder 4
Scope
Local variables are only accessible within declaring procedure Each execution has its own private copy
Lifetime
Allocated when procedure starts
Freed when procedure returns (in most languages)
CPSC 213 2020 ST1 2020 Jonatan Schroeder 5
Activation: execution of a procedure
Starts with procedure is called, ends when it returns
There can be many activations of same procedure alive at once
Activation Frame
memory that stores activations state Includes local variables and arguments
Should we allocate activation frames from the heap? Call malloc() to create frame on procedure call?
Call free() on procedure return?
CPSC 213 2020 ST1 2020 Jonatan Schroeder 6
Order of frame allocation and deallocation is special freed in reverse order of allocation
Simple allocation for frames:
Reserve big chunk of memory for all frames
Initial address known
Simple, cheap allocation: add or subtract from a pointer
Questions
What data structure is this like?
What restriction do we place on lifetime of local variables?
CPSC 213 2020 ST1 2020 Jonatan Schroeder 7
r5
Stack of activation frames
Stored in memory, grows up from bottom
Stack pointer
General purpose register (we will use r5)
Stores base address (address of first byte) of current frame
Current frame is the top of the stack
First activation is the bottom or base of the stack
Local variables of initial function (e.g., main)
Stack Top (current frame)
Code
Static data
Heap
Stack
Stack Bottom (first frame)
CPSC 213 2020 ST1 2020 Jonatan Schroeder
8
r5
Value of the stack pointer is dynamic
Local variables and arguments
Size of each frame is (usually) static Offset from stack pointer is (usually) static
Each frame is like a struct
Base of frame is in r5 (stack pointer)
Each variable in procedure is a member of the struct
Stack Top (current frame)
Code
Static data
Heap
Stack
Stack Bottom (first frame)
CPSC 213 2020 ST1 2020 Jonatan Schroeder
9
Local variables Arguments
Some architectures use registers for arguments
Return address (ra)
Register r6 works for one call, but not for multiple calls
Saved only if needed (i.e., only if function calls other functions)
Other saved registers
Called function may change register values Values that must be kept after call are saved
Also, if function needs more registers than available
saved registers
local variables
return address
arguments
saved registers
CPSC 213 2020 ST1 2020 Jonatan Schroeder 10
Order of storage
Decided by the compiler
Based on order convenient for stack creation
Static offset to any member from its base (frame pointer) There will need to be some convention for interoperability
Example:
void b (int a0, int a1) {
int i0 = 0; int i1 = 1; c();
}
r[5]+0x00: i0 0x04: i1
0x08: return address 0x0c: a0
0x10: a1
saved registers
local variables
return address
arguments
saved registers
CPSC 213 2020 ST1 2020 Jonatan Schroeder
11
Frame is stored and accessed similarly to a struct Base is in r5 (by convention)
Offset is known statically (order by convention)
Example: int i0 = 0; int i1 = 1;
ld $0, r0
str0,(r5) #i0=0 ld $1, r0
str0,4(r5) #i1=1
r[5]+0x00: i0 0x04: i1
0x08: return address
0x0c: a0
0x10: a1
CPSC 213 2020 ST1 2020 Jonatan Schroeder
12
What would be the code for the following? A. ld 16(r5), r0
i1 = a1;
r[5]+0x00: i0 0x04: i1
0x08: return address
0x0c: a0
0x10: a1
st r0, 4(r5)
B. ld 10(r5), r0 st r0, 0(r5)
C. ld 16(r5), r0 st r0, 0(r5)
D. ld 10(r5),r0 st r0, 4(r5)
E. Follow the white rabbit, Neo
CPSC 213 2020 ST1 2020 Jonatan Schroeder
13
What is the value of l in foo when it is active?
void goo() { int x = 3; } void foo() { int l; }
goo(); foo();
A. 0
B. 3
C. 42
D. NULL
E. no way to know
CPSC 213 2020 ST1 2020 Jonatan Schroeder
14
What is wrong with this?
int *foo() {
int l;
return &l }
void bar() {
int *l = malloc(100);
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder
15
Latest version of C allows local arrays to have dynamic size: void foo(int n) {
int a[n]; int b;
b = 0;
}
What code does the compiler generate for the last statement?
CPSC 213 2020 ST1 2020 Jonatan Schroeder 16
The allocation of the stack is relatively simple: Decrement the stack pointer to allocate Increment the stack pointer to free
Whose responsibility is it to allocate in the stack? Caller doesnt [need to] know the size of callees stack However, caller needs to copy the arguments
Division of labour:
Caller allocates space for arguments Callee allocates space for the rest
CPSC 213 2020 ST1 2020 Jonatan Schroeder 17
Code that executes just before procedure starts Part in caller before call
Part in callee at beginning of call
Generated by the compiler for procedure call
Allocates activation frame and changes stack pointer Subtract frame size from stack pointer r5
Possibly saves register values Assigns values to allocated arguments
CPSC 213 2020 ST1 2020 Jonatan Schroeder 18
Code that executes when procedure ends Part in callee just before return
Part in caller just after returning
Deallocates activation frame and restores stack pointer Add frame size to stack pointer r5
Possibly restores some saved register values
CPSC 213 2020 ST1 2020 Jonatan Schroeder 19
Caller prologue (before call) Allocate stack space for arguments Save actual argument values to stack
Callee prologue (start of function code)
Allocate stack space for return address and local variables Save return address to stack, if needed
Callee epilogue (end of function code)
Load return address from stack if stored there
Deallocate stack space of return address and local variables
Caller epilogue (after return from call) Deallocate stack space of arguments
CPSC 213 2020 ST1 2020 Jonatan Schroeder 20
void foo() { b(0, 1);
}
void b(int a0, int a1) {
int i0 = a0; int i1 = a1; c();
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder 21
foo: deca r5 # allocate foos stack void foo() {
st r6, 0(r5)# save ra on stack
}
ld $-8, r0 b(0, 1);
add r0, r5 ld $0, r0 st r0, 0(r5) ld $1, r0
# size of bs arguments # allocate bs arguments
# a0 = 0
void b(int a0, int a1) { st r0, 4(r5) # a1 = 1
Caller Prologue
Call Caller Epilogue
}
inca r5 # free foos stack j 0(r6) # return
gpc $6, r6 # set return address int i0 = a0;
j b # call function b
inldt i$81, r=0 a1;# size of bs arguments
addr0, r5 # free bs arguments
c();
ld 0(r5), r6 # load return address from stack
CPSC 213 2020 ST1 2020 Jonatan Schroeder 22
void foo() {
b(0, 1);
}
void b(int a0, int a1) {
int i0 = a0; int i1 = a1; c();
}
b: ld $-12, r0 void foo() {
# size of bs frame Callee Prologue # allocate bs frame
# store return address in stack
add r0, r5
b(0, 1); st r6, 8(r5)
ld 12(r5), r0
}
voidld b(16i(nr5t), ar0, int a1) {
# i1 = a1
# set return address
# call c()
# load return address from stack # size of bs frame
# free bs frame
# return
}
Callee Epilogue
st r0, 0(r5)
# i0 = a0
st r0, 4(r5)
int i0 = a0; gpc $6, r6
j c
int i1 = a1;
ld 8(r5), r6
c();
ld $12, r0
add r0, r5 j 0(r6)
Callees Body
CPSC 213 2020 ST1 2020 Jonatan Schroeder
23
void foo() {
b(0, 1);
}
void b(int a0, int a1) {
int i0 = a0; int i1 = a1; c();
}
Every program thread starts with a hidden procedure Name varies, usually start or entrypoint, sometimes crt0
The start procedure:
Allocates memory for stack
Initializes the stack pointer
Calls the threads first procedure (e.g., main())
start: ld $stackBottom, r5
inca r5
gpc$6, r6
jmain
halt
.pos 0x1000
stackTop:
.long 0x0
.long 0x0
stackBottom: .long 0x0
CPSC 213 2020 ST1 2020 Jonatan Schroeder
24
What is the value of r5 in three(), assuming we begin with a call to foo()?
void three() {
int i;
int j; int k;
}
void two() {
int i; int j; three();
void one() {
int i;
two(); }
void foo() { // r5 = 2000 one();
}
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder
25
Number and size of arguments is statically determined
Value of arguments is dynamically determined
Compiler may choose to put arguments on the stack
Caller pushes argument values onto stack, callee reads arguments
from stack
Sometimes compiler chooses to use registers for arguments
Often used in Intel-based programs
Caller places argument values in pre-determined registers (convention), callee reads arguments from these registers
Why is this done? When is it a good idea? When is this not a good idea?
CPSC 213 2020 ST1 2020 Jonatan Schroeder 26
Return value
In C and Java, procedures/methods can return only a single value C compilers use a designated register (e.g., r0) for this
Return address does not always need to be saved to the stack
When is this possible?
Local variables sometimes dont need to be in the stack
Why? When is this possible?
CPSC 213 2020 ST1 2020 Jonatan Schroeder 27
int s;
void foo() {
s = add(1, 2);
}
void add(int a, int b) {
return a + b;
}
foo: deca r5
st r6, (r5)
ld $-8, r0 add r0, r5 ld $1, r0 st r0, 0(r5) ld $2, r0 st r0, 4(r5) gpc $6, r6
j add
ld $8, r1 add r1, r5 ld $s, r1 st r0, (r1) ld (r5), r6 inca r5
j (r6)
foo: deca r5
st r6, (r5)
ld $1, r0
ld $2, r1
gpc $6, r6 j add
ld $s, r1
st r0, (r1)
ld (r5), r6
inca r5
j(r6)
add:
Prologue Epilogue
add: ld ld
0(r5), r0
4(r5), r1
r1, r0
Argument loading
add
j (r6)
add r1, r0 j (r6)
CPSC 213 2020 ST1 2020 Jonatan Schroeder
28
Stack is managed by code generated by compiler
Grows from bottom up, allocated by subtracting stack pointer
Local variables and arguments accessed with static offset from stack pointer
Caller prologue
Allocates space on stack (or registers) for arguments, stores their values
Callee prologue
Allocates space on stack for local variables and saved registers
Callee epilogue
Deallocates stack frame, restores saved registers
Caller epilogue
Deallocates arguments on stack, get return value from r0
CPSC 213 2020 ST1 2020 Jonatan Schroeder 29
proc:
deca r5
st r6, (r5) ld 8(r5), r1 beq r1, L0 dec r1
ld 4(r5), r2 deca r5
deca r5
st r2, (r5) st r1, 4(r5) gpc $6, r6
j proc
inca r5
inca r5
ld 4(r5), r2
ld 8(r5), r1 dec r1
ld (r2,r1,4), r1 add r1, r0
br L1
L0: ld $0, r0 L1: ld (r5), r6
inca r5 j (r6)
CPSC 213 2020 ST1 2020 Jonatan Schroeder
30
Can you spot the bug?
int main(int argc, char **argv) { u.is_admin = 0;
printf(Hello, %s
, argv[1]); strcpy(u.username, argv[1]); if(u.is_admin) {
struct user {
char username[16]; char is_admin;
} u;
printf(Hello admin: %c!!
, u.is_admin);
system(/bin/sh); } else {
printf(Bye %s!
, u.username);
}
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder 31
What about here?
void getInput (void) { char buf[1000]; for(int i=0; ; i++) {
char ch = getchar(); if(ch ==
) break; buf[i] = ch;
}
printf(Read %s.
, buf); }
int main (void) {
puts(Starting.);
getInput();
puts(Done.);
}
CPSC 213 2018 WT1 2018 Jonatan Schroeder
32
Program code is just bytes in memory
Program data is just bytes in memory
This means we can include data that can be interpreted as code
Injected code can do anything
Most attackers want an interactive shell inject
shellcode
If the input is longer than 1000 bytes, the loop will write into memory beyond the end of buf
If an attacker can provide the input, it can determine what values are written into memory beyond the end of buf
buf[0] buf[1] buf[2] buf[3] buf[4] buf[5] buf[6] buf[7] buf[999] ra
CPSC 213 2020 ST1 2020 Jonatan Schroeder
33
The return address is
stored in memory on the stack
the address of instruction that executes after return changing it changes the programs control flow
Attackers goal
introduce code into the program from outside trick program into running it
Changing the return address
allows attacker to change control flow
if it points to data, then that data becomes the program
buf[0] buf[1] buf[2] buf[3] buf[4] buf[5] buf[6] buf[7] buf[999] ra
CPSC 213 2020 ST1 2020 Jonatan Schroeder
34
Hard parts
determining location of return address in attack string determining address to change return address to
Making it easier
approximate return address value
start string with nop instructions (padding) aka nop slide, nop sled or nop ramp
finally include the code for the worm Insert many copies of return address
One of them may hit the right spot
Works if return address guess within nop slide
nop nop nop nop nop nop nop nop nop nop nop nop Get shell 0x1234 0x1234 0x1234 0x1234 0x1234
buf[0] buf[1] buf[2] buf[3] buf[4] buf[5] buf[6] buf[7] buf[993] buf[994] buf[995] buf[996] buf[997] buf[998] buf[999] ra
CPSC 213 2020 ST1 2020 Jonatan Schroeder
35
This attack exploited buffer overflow in IIS (Microsofts Web Server)
Infected 359,000 machines within14 hours
Estimated cost: $2.6B
Displayed string: HELLO! Welcome to http://www.worm.com! Hacked by Chinese!
Was to launch DOS attack on whitehouse.gov, but detected; White House changed IP address to avoid attack
GET /default.ida?NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN%u9090%u6858%ucbd 3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003% u8b00%u531b%u53ff%u0078%u0000%u00=a
CPSC 213 2020 ST1 2020 Jonatan Schroeder 36
Buffer boundary check What if stack grew down?
Active frame at the highest addresses
Modern protections
Non-executable stack
Stack Canaries
ASLR: Randomized memory (and stack) addresses
CPSC 213 2020 ST1 2020 Jonatan Schroeder 37
CPU instruction that signals OS to do something
Typically something that regular processes dont have permission
to do
Examples: read/write terminal, read/write file, execute programs Details: CPSC 313
Like function calls
Arguments passed in registers r0, r1, r2
CPSC 213 2020 ST1 2020 Jonatan Schroeder 38
New requirement: system call
Instruction includes which system call to use
Remaining arguments passed by registers r0, r1, r2 Return value saved in r0
New instruction:
sys $0: read(fd, buffer, size) read data from fd (0 = stdin)
returns number of read bytes or -1 on error
sys $1: write(fd, buffer, size) write data to fd (1 = stdout)
returns number of written bytes or -1 on error sys $2: exec(buffer, size) execute program
returns 0 if successful or -1 on error
CPSC 213 2020 ST1 2020 Jonatan Schroeder 39
Name
Semantics
Assembly
Machine
system call
system call #n
sys $n
f1nn
.pos 0x1000
ld $1, r0
ld $str, r1 ld $12, r2 sys $1
halt
.pos 0x2000 str:
.long 0x68656c6c # hell .long 0x6f20776f # o wo .long 0x726c640a # rld
CPSC 213 2020 ST1 2020 Jonatan Schroeder 40
In assignment 7, you will write a real exploit
You will use a program that has a vulnerability
You will also write malicious code
You will have the vulnerable program execute your code
Attacker input must include the code
You will enter machine code as data in input string
CPSC 213 2020 ST1 2020 Jonatan Schroeder 41
Meets every Tuesday at 5pm on Discord https://ubcctf.github.io/ (invite there)
Can email Robert Xiao for more info: [email protected]
CPSC 213 2020 ST1 2020 Jonatan Schroeder 42
Recursive functions have tail recursion if the recursion is the last operation of the function:
void foo(int a, int b) { if (b == 0)
return a; else
return foo(a+1, b-1);
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder 43
Careful, this is not tail recursion: void fact(int n) {
if (n == 0) return 1;
return n * fact(n-1); }
But this is:
void fact(int n, int acc) {
if (n == 0) return acc;
return fact(n-1, n*acc); }
CPSC 213 2020 ST1 2020 Jonatan Schroeder 44
A tail recursion can be converted to a loop: void fact(int n, int acc) {
if (n == 0) return acc;
return fact(n-1, n*acc); }
void fact_it(int n, int acc) {
while(1) {
if (n == 0) return acc;
n = n-1; acc = n*acc;
}
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder 45
sum:
deca r5
st r6, (r5)
ld 0(r5), r1 # r1=n ld 4(r5), r1 # r1=n
ld 4(r5), r0 # r0=acc
ld 8(r5), r0 # r0=acc
void sum(int n, int acc) { if (n == 0) return acc; return sum(n-1, n+acc);
}
L1:
beq r1, L0
add r1, r0
dec r1
deca r5
deca r5
# if n==0 return #r0=n+acc #r1=n-1
st r1, 0(r5) # arg0 = n-1
st r0, 4(r5) # arg1 = n+acc
gpc $6, r6
br L1 j sum
inca r5
# call sum
inca r5 L0:
ld (r5), r6 # restore r.a. inca r5
j (r6) # return
CPSC 213 2020 ST1 2020 Jonatan Schroeder
46
sum: ld 0(r5), r1 # r1=n
ld 4(r5), r0 # r0=acc
L1:beq r1, L0 # if n==0 return
add r1, r0 dec r1
br L1
L0: j (r6)
# r0 = n+acc # r1 = n-1
# return
void sum(int n, int acc) { if (n == 0) return acc; return sum(n-1, n+acc);
}
CPSC 213 2020 ST1 2020 Jonatan Schroeder
47
Activation frame for caller replaced with callees, function call replaced with regular jump or branch
Less use of stack
Fewer memory accesses
Fewer calls to indirect jumps (important for pipelining, see CPSC 313) Less instructions overall
Can be done with tail call to other functions Adjustments may need to be made to activation frame size
Optimization may be done by the compiler itself Only works if its an actual tail call
No operation performed after the call
CPSC 213 2020 ST1 2020 Jonatan Schroeder 48
Global variables:
address known statically, instruction uses address explicitly
Reference variables (pointers): variable stores address of value may be allocated dynamically
Arrays
elements named by numeric index
address of element is base plus index times size of element base and index can be static or dynamic; size is static
Instance variables
Offset to variable from start of object/struct known statically Start of object/struct may be dynamic
Local variables and arguments
Offset to variable from start of activation frame known statically Address of stack frame (stack pointer) is dynamic
CPSC 213 2020 ST1 2020 Jonatan Schroeder 49
Reviews
There are no reviews yet.