ECS 150 System Calls
Prof. Joel Porquet-Lupine
UC Davis 2020/2021
Copyright 2017-2021 Joel Porquet-Lupine CC BY-NC-SA 4.0 International License /
1 / 43
C Standard Library
C program example
Execution
#include
#include
int main(int argc, char *argv[]) {
int fd; char buf[4];
if (argc < 2) exit(EXIT_FAILURE);memset(buf, 0, sizeof(buf));fd = open(argv[1], O_RDONLY); if (fd < 0) {perror(“open”);exit(EXIT_FAILURE);}read(fd, buf, sizeof(buf)); close(fd);printf(“Executable detection… “); if (buf[0] == 0x7F && buf[1] == 0x45&& buf[2] == 0x4C && buf[3] == 0x46)printf(“ELF”);else if (buf[0] == ‘#’ && buf[1] == ‘!’)printf(“script”); elseprintf(“Not an executable!”);printf(”
“);return 0; }exec_detector.c $ make exec_detectorcc -o exec_detector exec_detector.c $ ./exec_detector$ ./exec_detector /path/to/nofile open: No such file or directory $ ./exec_detector exec_detector Executable detection… ELF $ ./exec_detector exec_detector.c Executable detection… Not an executable! $ ./exec_detector /bin/firefox Executable detection… script $ cat /bin/firefox#!/bin/shexec /usr/lib/firefox/firefox “$@” $ ./exec_detector /usr/lib/firefox/firefox Executable detection… ELF2 / 43/ C Standard LibraryLibrary functions DeclarationAccess via inclusion of headersFunction prototypesGlobal variables (e.g. errno) Type definitions (e.g. size_t) Macros (e.g., O_RDONLY)DefinitionActual code via libraryLinked at compile-time Dynamically loaded at runtime #include
memset(buf, 0, sizeof(buf));
fd = open(argv[1], O_RDONLY); if (fd < 0) {perror(“open”);exit(EXIT_FAILURE);}read(fd, buf, sizeof(buf));close(fd);…printf(“Executable detection… “);…printf(”
“);exec_detector.c Categories of functions1. No privileged operation to perform2. Always needs to request privileged operation from OSKnown as system call, or syscall3. Sometimes needs to request privileged operation from OS 3 / 43/ C Standard Library1. Regular functions Usage exampleonly needs to have access to array bufis already part of the memory (defined in stack)No need for special operationsImplementation exampleGeneric C implementationchar buf[4]; …memset(buf, 0, sizeof(buf));exec_detector.cmemset() buf void *memset(void *s, int c, size_t count){char *xs = s;while (count–) *xs++ = c;return s; }4 / 43/ C Standard Library2. (Always) privileged functionsUsage exampleNeed for special privileges: e.g.,Verify that file exists on a physical medium accessible by computer (e.g., hard-drive, SD card, network, etc.)Check that current user has permission to open it with specified modeActually read file’s data from physical mediumSensitive operations passed to OS Accessible via syscallsImplementation exampleSpecific to Linuxfd = open(argv[1], O_RDONLY);…read(fd, buf, sizeof(buf)); close(fd);exec_detector.c#include
#include
ssize_t read(int fd, void *buf,
size_t count)
{
long r = syscall(SYS_read,
if (r < 0) { errno = -r;return -1; }return r; }fd, buf, count); Specific to x86_64 processors static __inline long __syscall3(long n, long a1,long a2, long a3){unsigned long ret;__asm__ __volatile__ (“syscall” : “=a”(ret)return ret; }: “a”(n), “D”(a1),”S”(a2), “d”(a3): “rcx”, “r11”, “memory”); 5 / 43/ C Standard Library3. (Sometimes) privileged functions Usage exampleprintf() prints to stdout which is internally buffered Flushed when buffer is full, or when encountering
Flushing requires to OS to actually write the charactersUse of (syscall) function write() Printf vs writeprintf(“Executable detection… “);…printf(”
“);exec_detector.c printf(“Hello “);sleep(2);printf(“world!
“);write(STDOUT_FILENO, “Hello “, 6);sleep(2);write(STDOUT_FILENO, “world!
“, 7);printf_write.c $ ./printf_write
Hello world!
Hello world!
6 / 43
/
System calls
Definition
Specific CPU instruction Immediate transfer of control
to kernel code Purpose
Secure API between user applications and OS kernel
Main categories
Process management Files and directories Pipes
Signals
Memory management
User mode
1
push count
push buf
push fd
call read()
User program
read:
movq $SYS_read, rax
movq $fd, rdi
movq $buf, rsi
syscall
ret
C Library Function
4
System call interface
2
Exception handler
OS Kernel
Syscall handler
sys_read() { }
3
Kernel mode
7 / 43
/
Process management
Definition of a process
A process is a program in execution
Each process is identified by its Process ID (PID)
Each process runs its own memory space
Each process is represented in the OS by a Process Control Block (PCB)
Data structure storing information about process
PID, state, CPU register copies for context switching, open files, etc.
Process 1
Process 2
Process 3
Syscall API
User Kernel
PCB
PCB
PCB
PID=1
state ctxt files
PID=2
state ctxt files
PID=3
state ctxt files
8 / 43
/
Process management
Main related functions/syscalls
Process creation and execution
: Create a new (clone) process
: Change executed program within running process Process termination
: End running process
: Wait for a child process and collect exit code
Process identification
: Get process PID
: Get parent process PID
fork()
exec()
exit()
wait()/waitpid()
getpid()
getppid()
9 / 43
/
Process management
fork()
Running process gets cloned into a child process Child gets an (almost) identical copy of parent
Same open files, command line arguments, memory, stack, etc. Child resumes at the fork() as well
Example
int a = 42;
int main(int argc, char *argv[])
{
int b = 23;
printf(Hello world!
);
fork();
printf(My favorite number is %d.
,
argc + a + b);
return 0; }
fork_101.c
$ ./fork_101
Hello world!
My favorite number is 66. My favorite number is 66.
10 / 43
/
Process management
Distinguishing between parent and child
fork() returns a value
PID of the child to the parent zero to the child
-1 to the parent in case of error
Example
int main(void)
{
pid_t pid;
pid = fork(); if (pid > 0)
printf(Im the parent!
); else if (pid == 0)
printf(Im the child!
); else
printf(Im the initial process!
But something went wrong
);
printf(Im here now, bye!
);
return 0; }
fork_201.c
$ ./fork_201
Im the parent! Im here now, bye! Im the child! Im here now, bye!
This exact output is not guaranteed
Parent and child are independent processes
Scheduling up to OS
11 / 43
/
Process management
Fork illustrated (1/3)
Process 1
fork() 1
User Kernel
PID=1
state ctxt files
PCB
12 / 43
/
Process management
Fork illustrated (2/3)
Process 1
Process 2
2
fork() 1
User Kernel
PID=1
state ctxt files
2
PID=2
state ctxt files
PCB
PCB
13 / 43
/
Process management
Fork illustrated (3/3)
Process 1
Process 2
2
fork() 1
=2
=0
3
User Kernel
3
2
PID=1
state ctxt files
PID=2
state ctxt files
PCB
PCB
14 / 43
/
ECS 150 System Calls
Prof. Joel Porquet-Lupine
UC Davis 2020/2021
Copyright 2017-2021 Joel Porquet-Lupine CC BY-NC-SA 4.0 International License /
15 / 43
Recap
C standard library
Non-privileged functions
E.g., memset()
Always/sometimes privileged functions
Syscall categories
Process management
A process is a program in execution
PCB data structure
Files and directories Pipes
Signals
Memory management
fork()
pid_t fork(void);
Clones parent process into child process Both return from fork call
Return value distinguishes between parent and child
E.g., read()/printf() Require system call
User mode
1
2
Kernel mode
4
push count
push buf
push fd
call read()
User program
read:
movq $SYS_read, rax
movq $fd, rdi
movq $buf, rsi
syscall
ret
C Library Function
System call interface
Exception handler
OS Kernel
Syscall handler
sys_read() { }
3
16 / 43
/
Process management
exec()
Current process starts executing another program Family of functions, with slight variations
exec[lv]p?e?() (see man page for details) Example
int main(void)
{
char *cmd = /bin/echo;
char *args[] = { cmd, ECS150, NULL}; int ret;
printf(Hi!
);
ret = execv(cmd, args);
printf(Execv returned %d
, ret);
return 0; }
execv.c
$ ./execv Hi! ECS150
Call to exec() functions never returns if it succeeds!
Otherwise returns -1 and continues
17 / 43
/
Process management
exit()
Termination of the current process Ability to return an exit value
Example
Exit at any time during execution
Or return from main()
Libc transparently exits
exit(main(argc, argv));
int main(void) {
return 0; }
if (error) exit(1);
Usage
$ ls /
$ echo $? 0
$ ls /nodir
ls: cannot access /nodir: No such file or directory
$ echo $?
2
$ if [[ ! $(ls /nodir >& /dev/null) ]]; then echo Expected; fi Expected
18 / 43
/
Process management
wait()/waitpid()
wait() makes parent wait for any of its children to exit Parent is blocked from execution in the meantime
waitpid() enables more advanced options, such as Specify PID of child to wait for
Dont block even if no children has returned
Example
pid = fork(); if (pid != 0) {
/* Parent */
int status;
wait(&status);
/* == waitpid(pid, &status, 0) */ printf(Child returned %d
,
WEXITSTATUS(status));
} else {
/* Child */
printf(Will exit soon!
);
exit(42); }
wait.c
$ ./wait
Will exit soon! Child returned 42
Output order guaranteed
Scheduling constrained by blocking call
19 / 43
/
Process management
Putting it together: fork() + exec() + wait() system()
Somewhat equivalent to what function s ystem() does internally
system(/bin/echo ECS150); But with a lot more control!
int main(void)
{
pid_t pid;
char *cmd = /bin/echo;
char *args[] = { cmd, ECS150, NULL};
pid = fork(); if (pid == 0) { /* Child */
execv(cmd, args);
perror(execv);
exit(1);
} else if (pid > 0) { /* Parent */
int status;
waitpid(pid, &status, 0); printf(Child returned %d
,
WEXITSTATUS(status));
} else { perror(fork);
exit(1); }
return 0; }
fork_exec_wait.c
$ ./fork_exec_wait ECS150
Child returned 0
20 / 43
/
Process management
The shell
A shell is a user interface to run commands in the terminal Typically makes heavy use of process-related functions
Naive pseudo-implementation
while (1) {
char **command;
display_prompt();
read_command(&command);
if (!fork()) { exec(command);
perror(execv);
exit(1); } else {
/* Parent */
waitpid(-1, &status, 0);
/* Repeat forever */
/* Display prompt in terminal */
/* Read input from terminal */
/* Fork off child process */
/* Execute command */
/* Coming back here is an error */
/* Wait for child to exit */
} }
Extra features
Background jobs (&), redirections (< and >) for connecting or stdout of the child to files (instead of the terminal), pipes (|) for connecting stdin or of the child to other processes, and many more.
stdin
stdout
21 / 43
/
Process management
getpid()/getppid()
Notion of a (family) tree of processes
Only one parent per process
But possibly multiple children
In Unix, init (PID=1) is ultimate ancestor
Only process created from scratch by kernel (and not by forking)
returns process PID return its parents PID
Example
1 23
getpid()
getppid()
578
int main(void)
{
if (fork() > 0)
/* Forces parent to wait for child
* to force scheduling order */
wait(NULL);
printf(My PID is %d
, getpid());
printf(My parents PID is %d
, getppid());
return 0; }
getpid.c
$ ./getpid
My PID is 406782
My parents PID is 406781 My PID is 406781
My parents PID is 162474
$ echo $$ 162474
22 / 43
/
System calls
Process management
Files and directories
Pipes
Signals
Memory management
23 / 43
/
Files and directories
Concepts
Files and directories in tree-like structure called Virtual File System Internal nodes are directories, leaf nodes are files
Every directory contains a list of filenames Every file contains an array of bytes
/
etc
home
Virtual File System
aab abc
bin
boot
aaa
grub
initrd.img
vmlinuz
usr
xyz
var
z99
VFS can aggregate files and directories from various physical media (local hard-drive, remote network share, etc.)
24 / 43
/
Files and directories
Main related functions/syscalls
File interaction
: open (create) file and return file descriptor : close file descriptor
: read from file
: write to file
: move file offset
File descriptor management dup()/dup2(): duplicate file descriptor
File characteristics stat()/fstat(): get file status
Directory traversal
: get current working directory
: change directory : open directory
: close directory : read directory
open()
close()
read()
write()
lseek()
getcwd()
chdir()
opendir()
closedir()
readdir()
25 / 43
/
Files and directories
Basic file interaction
open() returns a file descriptor (FD), used for all interactions with file Closed by when done with file
read()
/
operations are sequential, tracked by file offset can manipulate current file offset
Example
write()
close()
lseek()
#include
int main(void)
int fd;
fd = open(file_101.c, O_RDONLY);
read(fd, &c, 1);
printf(%c
, c);
read(fd, &c, 1);
printf(%c
, c);
lseek(fd, -2, SEEK_END);
read(fd, &c, 1);
printf(%c
, c);
close(fd);
file_101.c
$ ./file_101 #
i
}
26 / 43
/
Files and directories
File descriptors
Definition
Table of open files per process Part of PCB
(Duplicated upon forking)
FDs are simple indexes in the table
Example
int fd1, fd2; Process fd1 = open(file_101.c, O_RDONLY); fd2 = open(file_201.c, O_RDWR);
FD table
Mode / Offset / File
0
3
4
PCB
User Kernel
file_101.c
file_201.c
RO
0
RW
0
int fd1, fd2;
fd1 = open(file_101.c, O_RDONLY);
fd2 = open(file_201.c, O_RDWR);
printf(fd1 = %d
, fd1);
printf(fd2 = %d
, fd2);
file_201.c
$ ./file_201 fd1 = 3
fd2 = 4
Allocation
open() always returns first available FD
PID=X state ctxt files
close(fd1);
fd1 = open(file_201.c, O_WRONLY);
printf(fd1 = %d
, fd1);
file_201.c
$ ./file_201
fd1 = 3
27 / 43
/
ECS 150 System Calls
Prof. Joel Porquet-Lupine
UC Davis 2020/2021
Copyright 2017-2021 Joel Porquet-Lupine CC BY-NC-SA 4.0 International License /
28 / 43
Recap
Process management
fork()
Clone process
exec()
Execute different program inside current process
wait()
Parent for children processes to terminate
Collect return value
Files and directories
File descriptors
int fd1, fd2; Process fd1 = open(file_101.c, O_RDONLY); fd2 = open(file_201.c, O_RDWR);
PCB
FD table
Mode / Offset / File
0
3
4
User Kernel
file_101.c
file_201.c
RO
0
RW
0
PID=X state ctxt files
29 / 43
/
Files and directories
Standard streams
Initially, three open file descriptors per process 0: standard input ( )
read(0, buf, 8); Process write(1, Hello!, 6);
FD table
Mode / Offset / File
0 1 2
STDIN_FILENO
1: standard output ( 2: standard error (
Example
) )
User Kernel
terminal
STDOUT_FILENO
STDERR_FILENO
Redirections
Can connect standard streams to other targets than the terminal
$ ./standard_fds
Hello Hello Hello Hello World World World
RO
WO
WO
write(STDOUT_FILENO, Hello , 6);
write(1, Hello , 6);
fprintf(stdout, Hello );
printf(Hello
);
write(STDERR_FILENO, World , 6);
write(2, World , 6);
fprintf(stderr, World
);
standard_fds.c
$ ./standard_fds > /dev/null
World World World
$ ./standard_fds 2> /dev/null | tr H J Jello Jello Jello Jello
$ ./standard_fds >& myfile.txt $ cat myfile.txt
Hello Hello World World World Hello Hello
30 / 43
/
Files and directories
File descriptor manipulation
dup2() replaces an open file descriptor with another Avoid using deprecated dup()
Example
int main(void)
{
int fd;
printf(Hello #1
);
fd = open(myfile.txt,
O_WRONLY | O_CREAT,
0644);
dup2(fd, STDOUT_FILENO);
close(fd);
printf(Hello #2
);
return 0; }
dup2.c
dup2(fd, STDOUT_FILENO);
User Kernel
Process
FD table
Mode / Offset / File
0
1
2
3
terminal
myfile.txt
RO
WO
0
WO
WO
0
$ ./dup2
Hello #1
$ cat myfile.txt Hello #2
31 / 43
/
Files and directories
File characteristics
stat() returns information about a file, from a filename
Same with fstat(), but from an FD
Example
struct stat { dev_t ino_t
st_dev;
st_ino;
st_mode;
st_nlink;
st_uid;
st_gid;
st_rdev;
st_size;
/* ID of device containing file */
/* Inode number */
/* File type and mode */
/* Number of hard links */
};
mode_t
nlink_t
uid_t
gid_t
dev_t
off_t
blksize_t st_blksize;
blkcnt_t st_blocks;
struct timespec st_atim; /* Time of last access */ struct timespec st_mtim; /* Time of last modification */ struct timespec st_ctim; /* Time of last status change */
/* User ID of owner */
/* Group ID of owner */
/* Device ID (if special file) */
/* Total size, in bytes */
/* Preferred blocksize for I/O */
/* Number of 512B blocks allocated */
int main(int argc, char *argv[])
{
struct stat sb;
stat(argv[1], &sb);
printf(File type: ); switch (sb.st_mode & S_IFMT) {
case S_IFDIR: printf(directory
);
case S_IFREG: printf(regular file
); break;
default: printf(other
);
}
printf(Mode: %lo (octal)
,
(unsigned long) sb.st_mode); printf(File size: %lld bytes
,
(long long) sb.st_size); printf(Last file access: %s,
ctime(&sb.st_atime));
return 0; }
break; break;
stat.c
$ ./stat stat.c File type:
Mode:
File size:
Last file access: $ ./stat .
File type:
Mode:
File size:
Last file access:
regular file
100644 (octal)
644 bytes
Fri Sep 18 16:32:02 2020
directory
40755 (octal)
4096 bytes
Fri Sep 18 11:59:49 2020
32 / 43
/
Files and directories
Directory traversal
/ to access the current working directory
to access the entries of a directory
Example
getcwd()
chdir()
opendir()/closedir()/readdir()
int main(int argc, char *argv[])
{
char cwd[PATH_MAX]; DIR *dirp;
struct dirent *dp;
getcwd(cwd, sizeof(cwd));
printf(Change CWD from %s to %s
,
cwd, argv[1]);
chdir(argv[1]);
dirp = opendir(.);
while ((dp = readdir(dirp)) != NULL)
printf(Entry: %s
, dp->d_name);
closedir(dirp);
return 0; }
dir_scan.c
$ pwd
/home/jporquet
$ ./dir_scan /
Change CWD from /home/jporquet to / Entry: ..
Entry: lib
Entry: home
Entry: etc
Entry: root
[]
Entry: proc
Entry: tmp
Entry: dev
Entry: bin
Entry: sbin
Entry: mnt
Entry: .
Entry: usr
33 / 43
/
System calls
Process management Files and directories Pipes
Signals
Memory management
34 / 43
/
Pipe
Definition
Inter-process communication (IPC)
Pipeline of processes chained via their standard streams
stdout of one process connected to stdin of next process
Example
Details
Internally implemented as anonymous files
Circular memory buffer of fixed size
Accessible via file descriptors and regular read/write transfers
Processes run concurrently, implicitly synchronized by communication
Process #1
read(0, );
write(1, );
write(2, );
FD table
Mode / Offset / File
Process #2
read(0, );
write(1, );
write(2, );
FD table
Mode / Offset / File
Process #3
read(0, );
write(1, );
write(2, );
FD table
Mode / Offset / File
RO
WO
WO
RO
000 1 pipe 1 pipe 1
222
WO
WO
$ du -sh * | sort -h -r | head -3 1.2G ecs150
555M ecs36c
386M ecs30
RO
WO
WO
terminal
35 / 43
/
Pipe
pipe()
Create a pipe and return two file descriptors via array
[0] for reading access, [1] for writing access
Example
int main(void)
{
int fd[2];
char send[7] = Hello!; char recv[7];
pipe(fd);
printf(fd[0] = %d
, fd[0]);
printf(fd[1] = %d
, fd[1]);
write(fd[1], send, 7);
read(fd[0], recv, 7);
puts(recv); return 0;
}
pipe.c
Process
int fd[2];
pipe(fd);
FD table
Mode / Offset / File
0 1 2 3
4
terminal
pipe
RO
WO
WO
RO
WO
$ ./pipe fd[0] = 3 fd[1] = 4 Hello!
36 / 43
/
Pipe
Process pipeline example
Pseudo-code setting up process1 | process2
void pipeline(char *process1, char *process2)
{
int fd[2];
pipe(fd);
if (fork() != 0) { /* Parent */
/* No need for read access */
close(fd[0]);
/* Replace stdout with pipe */
dup2(fd[1], STDOUT_FILENO);
/* Close now unused FD */
close(fd[1]);
/* Parent becomes process1 */
exec(process1);
} else { /* Child */
/* No need for write access */ close(fd[1]);
/* Replace stdin with pipe */ dup(fd[0], STDIN_FILENO);
/* Close now unused FD */
close(fd[0]);
/* Child becomes process2 */
exec(process2);
} }
37 / 43
/
System calls
Process management Files and directories Pipes
Signals
Memory management
38 / 43
/
Signals
Definition
Form of inter-process communication (IPC) Software notification system
From process own actions: e.g.,
From external events: e.g., About 30 different signals (see
Default action
In case process does not define specific signal handling
Terminate process: e.g., SIGINT, SIGKILL*
Terminate process and generate core dump: e.g.,
(Segmentation fault) (Ctrl-C)
)
Handling or ignoring
Possible to change default action (but not for all signals *!)
Ignore signals
Mask of blocked signals Set signal handlers
Function to be run upon signal delivery
Ignore signal: e.g.,
SIGSEGV
SIGSEGV
SIGINT
man 7 signal
SIGCHLD
Stop process: e.g., Continue process: e.g.,
*
SIGSTOP
SIGCONT
39 / 43
/
Signals
Main related functions/syscalls
Sending signals
: Send signal to self
: Send signal to other process
: Set timer for self
Receive signal ( /SIGVTALRM) when timer is up
raise()
kill()
alarm()/setitimer()
Blocking signals
Receiving signals
: Examine or change signal mask : Examine pending blocked signals
sigprocmask()
SIGALRM
sigpending()
: Map signal handler to signal Also but usage not recommended
: Suspend self until signal is received
sigaction()
signal()
pause()
40 / 43
/
Signals
Example
void alarm_handler(int signum)
{
printf(
Beep, beep, beep!
);
}
int main(void)
{
struct sigaction sa; sigset_t ss;
/* Ignore Ctrl-C */
sigemptyset(&ss);
sigaddset(&ss, SIGINT);
sigprocmask(SIG_BLOCK, &ss, NULL);
/* Set up handler for alarm */
sa.sa_handler = alarm_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGALRM, &sa, NULL);
/* Configure alarm */
printf(Alarm in 5 seconds
);
alarm(5);
/* Wait until signal is received */
pause();
/* Bye, ungrateful world */
raise(SIGKILL);
return 0; }
signal.c
$ ./signal
Alarm in 5 seconds
^C^C^C^C
Beep, beep, beep!
zsh: killed ./signal
41 / 43
/
System calls
Process management Files and directories Pipes
Signals
Memory management
42 / 43
/
Memory
Division of labor
User C library
malloc()/free() for dynamic memory allocation
Heap memory segment (at the end of data segment)
Fine-granularity management only
When heap is full, syscall to kernel to request for more
Related functions/syscalls
sbrk()/brk()
Increase size of data segment Old way of allocating heap space Legacy function now
OS/Kernel
Memory management at page level
Allocation of big chunks (many pages) to user library
mmap()
Map pages of memory in process address space
Can also map a files contents
Extremely versatile and powerful function
43 / 43
/
Reviews
There are no reviews yet.