3 main purpose:
when to waste resource
main difficult of writing a real-time system
distinction between kernel mode and user mode
following instruction should be previledged, in kernel mode
set value of timer
clear memory
turn off interrupts
modify entries in device-status table
access I/O device
kernel mode in ring 0, user mode in ring 3
x86 GPR EAX EBX ECX counter in loops EDX EDI destination in string/memory ops ESI source in string/memory ops ESP stack pointer EBP base frame pointer CR0 paging on/off CR2 linear address that caused a page fault CR3 base address of paging data structure CR4 hardware virtualization config DR0-7 memory breakpoints
EFLAGS
ZF zero SF sign CF carry OF overflow
jump table
IDT change with reboots
XOR reg, reg
REP/REPNE prefix
STOS/SCAS
calling conventions CDECL STDCALL FASTCALL
function prologue/epilogue
sum = addme(x,y)
push ebp
move ebp, esp
movsx eax, word ptr [ebp+8]
movsx ecx, word ptr [ebp+0Ch]
add eax, ecx
mov esp, ebp
pop ebp
retn
frame pointer omission: skip EBP
x64
RIP-relative addressing
one calling convention
contains the addresses of all the service routines
asynchronous :Triggered by an event from a “device”
condition code bits set by comparion instructions
CPU priority
mode, user/kernel
Synchronous: triggered by “trap instruction” for syscall
Side-effect of executing a trap in userspace is that an “exception” is raised and program execution continues at a prescribed instruction in the kernel
Synchronous: triggered by a “fault condition” of an instruction condition
include
address space
process table entries, state, registers. open files, threads state, resources.
parts
text section, program code
program counter, processor registers
stack: function parameters, return addresses, local variables
data section, global variables
heap: memory dynamically allocated
address virtualization
protect private sections
readonly, readwrite, execute
save and load PCB when interrupt or system call
process state(running, waiting)
program counter(location of next instruction)
CPU registers
CPU scheduling information, priorities, scheduling queue pointers
memory allocated
CPU used, clock time, time limites
IO devices, open files
pid t_pid; /* process identifier */
long state; /* state of the process */
unsigned int time_slice /* scheduling information */
struct task_struct *parent; /* this process’s parent */
struct list_head children; /* this process’s children */
struct files_struct *files; /* list of open files */
struct mm_struct *mm; /* address space of this process */
job queue, all processes
ready queue, residing in main memory, ready to execute
device queue, waiting for IO
CPU bound, IO bound
short-term scheduler(CPU scheduler), milisec, what to exec next
long-term scheduler(job scheduler), sec, what to put to ready queue
medium-term scheduler, remove, store, recover, control degree of multiprogramming
Process Control Block
parent fork()
children, and exec()
children, wait till children’s termination
sharing options, share all, share subset, share none
execution options, execute concurrently, or parent watis until children terminate
pid_t pid = fork();
if (pid == 0) {
// child process
execv(path, executablename);
} else if (pid > 0) {
// parent process
waitpid(pid, &status, option);
} else {
// fork failed
}
ask os to delete itself by calling exit()
return data from child to parent using wait()
parent use abort()
to terminate child
cascade termination, child existence is dependent on parent
zombie, if no parent waiting
orphan, if parent terminate without calling wait()
pass to registers
block(linux/solaris), save addresses in register
stack, pushed by program, popped by OS
system calls are an extension of ABI(Application Binary Interface)
definition agreed upon by libc and kernel
implemented as assembler largely taking the arguments already in the right registers and TRAP-ing into the kernel
and run a peice of assembler code:
check the syscall number is in range
change stack to kernel
arguments already in place
call to syscall_table[registers.syscall_number]
switch back from kernel stack to user stack and RFI
The compiler associates the syscall number with the kernel internal function
create file, delete file
open, close
read, write, reposition
get, set file attributes
request, release device
read, write, reposition
get, set device attributes
attach, detach devices
get, set time or date
get, set system data
get, set process, file, device attributes
crate, delete communication connection
send, receive messages, to host name, or process name, from client to server
create, gain access to memory regions
transfer status info
attach and detach remote devices
control access
get, set permissions
allow, deny user access
create, delete, copy, rename, print, dump, list
data, time, memory space, disk space, number of users
performance, logging, debugging
format and print to terminals
registry - store and retrive configuration info
create, modify, search content, transform text
compilers, assemblers, debuggers, interpreters
absolute loaders, relocatable loaders, linkage editors, overlay loaders, debugging systems
create virtual connection among processes, users, computer systems. absolute loaders
launch at boot time, disk checking, process scheduling, error logging, printing
subsystems, daemons
new, running, waiting, ready, terminated
ready
running
waiting
terminate
running to wait, terminate are non-preemptive, all others are preemptive, caused by access to shared data, preemption in kernel mode, interrupt during crucial os activities.
give control of CPU to the process selected by scheduler
context switch
switch to user mode
jmp to the location
dispatch lantency: $t_{start new proc} - t_{stop one proc}$ = confict phase(real-time CPU scheduling) + dispatch phase
preempt process running in kernel mode
release resources needed by high prio processes
CPU utilization
throughtput
turnaround time, amount of time to execute a particular process
waiting time
response time
short process behind long process
use aging(increase prio as time progresses) to solve starvation(low prio never get served)
80% of CPU bursts should be shorter than q
scheduling between queues
fixed prio, serve all from foreground and then background
time slice, 80% to fg, 20% to bg
implement aging, move between queues
process-contention scope
system-contention scope
linux macos only allow pthread_scope_system
homogeneous
asymmetric, only one processor access the data structures
symmetric(SMP), most common; each has self-sheduling, ready queue in common, or each has its own
processor affinity, due to memory locality, process are close to certain processor.
might need move process across processors, either
- push, periodic task check load on processors, and move task across CPUs
- pull, idle processors pulls waiting task fro busy one
soft real time
hard real time
prio assigned based on inverse of period
prio assigned based on deadline
in steady state, processes leaving queue must equal processes arriving
\[n = \lambda \mathbf W\]n: average queue length
W: average waiting time in queue
$\lambda$: average arrival rate into queue
processes are resource containers
threads are unit of execution in a process
threads share code, data, files
threads has own registers, stacks
S: serial portion (parallel or serial)
N: processing cores
N goes to infinite, speedup approaches to $\frac{1}{S}$
many to one
either user or kernel level
POSIX standard IEEE 1003.1c
void *runner(void *param)
pthread_t tid;
pthread_attr_t attr;
pthread_exit(0);
#pragma omp parallel for
^{}
dispatch_queue_t queue = dispatch_get_global_queue(prio_default, 0);
dispatch_async(queue, ^{});
async cancel, terminate immediately
deferred cancel, allow thread periodically check if itself should be cancelled
cancel state either disabled or enabled, cancel default mode is deferred
cancellation is a signal
pthread_testcancel()
cleanup handler
pthread_t tid;
pthread_create(&tid, 0, worker, NULL);
pthread_cancel(tid);
shared memory
message passing
char * name = "this class sucks";
int shm_fd = shm_open(name, O_CREAT | O_RDWR, 0666);
open an existing memory segment to share it
ftruncate(shm fd, 4096)
char * shared_addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0);
sprintf(shared_addr, "writing to the shared memory");
Special IP address 127.0.0.1 (loopback) to refer to system on which process is running
tcp socket, udp socket, multicast socket in Java
stubs
marshalls
RPCGen
matchmaker
ordinary pipes only access to parent-child relationship, unidirectional
named pipes all access, bidirectional, used for several processes
provide buffer, block, unblock producers and consumers
4kb guaranteed to be atomic
64kb
scheduling, blocking, resource management.
do {
entry section
critical section
exit sectionm
remainder section
} while(true)
mutual exclusion
progress
bounded waiting
load and store are atomic
turn indicates whose turn
flag indicates if ready to enter critical section
do {
flag[i] = true;
turn = j;
while (flag[j] && turn == j);
critical section
flag[i] = false;
remainder section
} while (true);
do { while (test_and_set(&block)); critical section lock = false; remainder section } while(true);
- compare and swap
int compare_and_swap(int value, int expected, int new_value) { int temp = *value; if (value == expected) *value = new_value; return temp; }
do { while(compare_and_swap(&lock, 0, 1) != 0); critical section lock = 0; remainder section } while(true);
## bounded waiting mutual exclusion
do { waiting[i] = true; key = true; while (waiting[i] && key) key = test_and_set(&lock);
waiting[i] = false; /* critical section */ j = (i + 1) % n; while ((j != i) && !waiting[j]) j = (j + 1) % n;
if (j == i) lock = false; else waiting[j] = false; /* remainder section */ } while (true);
## mutex
- acquire
- release
- both must be atomic
- require busy waiting
- thus call a spinlock
acquire() { while (!available); /* busy wait */ available = false; } release() { available = true; }
## semaphore
- wait, P()
- signal, V()
wait(S) { while (S <= 0); // busy wait S–; }
signal(S) { S++; }
- counting semaphore
- binary semaphore
> > **implementation with busy waiting**
- must guarantee no processes run wait or signal of one semaphore at the same time
- thus must be put in critical section, and we have busy waiting
- implementation code is short, so chance of busy waiting is rare.
> > **implementation without busy waiting**
typedef struct { int value; struct process *list; } semaphore;
- block
place the process on the waiting queue
- wakeup
remove the process from the waiting queue, add to ready queue
## deadlock
two or more processes are waiting indefinitely for an event that can be caused by only one of the waiting processes
P_0|P_1
--|--
wait(S);|wait(Q);
wait(Q);|wait(S);
...|...
signal(S);|signal(Q);
signal(Q);|signal(S);
## starvation
A process may never be removed from the semaphore queue in which it is
suspended
### bounded buffer problem
n buffers
- semaphore mutex = 1
- semaphore full = 0
- semaphore empty = n
> > producer
do { // produce an item; wait(empty); wait(mutex);
… // add next produced to the buffer …
signal(mutex); signal(full); } while(true);
> > consumer
do { wait(full); wait(mutex);
… // remove an item from buffer
signal(mutex); signal(empty);
// consumer the item } while(true);
### reader-write problem
if a writer is in the critical section and n readers are waiting, then one reader is queued on rw mutex, and n − 1 readers are queued on mutex
- semaphore rw mutex = 1
- semaphore mutex = 1
- int read count = 0
> > writer
do { wait(rw mutex); … /* writing is performed */ … signal(rw mutex); } while (true);
> > producer
do { wait(mutex); read count++;
if (read count == 1) wait(rw mutex);
signal(mutex); … /* reading is performed */ … wait(mutex); read count–;
if (read count == 0) signal(rw mutex);
signal(mutex); } while (true);
### dining-philosophers problem
five chair, five single chopsticks
when think, does not interact with others, when hungry, pick two around him or her.
allocate several resources among several processes in a deadlock-free and starvation-free manner
> > soluttion with deadlock
do { wait(chopstick[i]); wait(chopstick[(i+1) % 5]); … /* eat for awhile / … signal(chopstick[i]); signal(chopstick[(i+1) % 5]); … / think for awhile */ … } while (true);
> > solution with [monitors](#monitors) and [conditional variables](#conditional-variables)
monitor philosopher-dining-problem { enum {THINKING, HUNGRY, EATING} state[5]; condition self[5];
void pickup(int i) { state[i] = HUNGRY; test[i]; if (state[i] != EATING) { self[i].wait(); } }
void putdown(int i) { state[i] = THINKING; test[i + 1]; test[(i + 4) % 5]; }
void test(int i) { if (state[(i+1)%5] != EATING && state[(i+4)%5] != EATING && state[i] == HUNGRY) { state[i] = EATING; self[i].signal(); } }
initialization_code() { for(int i = 0; i < 5; i++) { state[i] = THINKING; } }
}
DiningPhilosophers.pickup(i);
… eat …
DiningPhilosophers.putdown(i);
## monitors
- high-level abstraction
- internal vars only accessible by the code within the procedure
- only one process may be active within the monitor at a time
monitor name { procedure 1 {}; procedure 2 {}; initialize() {}; }
## monitors implementation
The signaling processes can use `next` to suspend themselves.
An integer variable `next_count` is also provided to count the number of processes suspended on `next`
> external function F
wait(mutex); … body of F … if (next_count > 0) signal(next); else signal(mutex);
> x.wait()
x_count, x_sem both init to 0
x_count++;
if (next_count > 0) signal(next); else signal(mutex);
wait(x_sem); x_count–;
> x.signal()
if (x_count > 0) { next_count++; signal(x_sem); wait(next); // signal and wait next_count–; }
## condition variables
wait and signal between two processes
- x.wait()
- the process invoking this operation is suspended until another process invokes signal
- x.signal()
- resume one of the processes that invoked x.wait()
- if no x.wait(), no effect on the variable.
The x.signal() operation resumes exactly one suspended process.
If no process is suspended, then the signal() operation has no effect
> > x.wait in Q, x.signal in P, P and Q cannot continue simultaneously, thus two options:
- P signal and P wait, until Q leaves the monitor, or for another condition
- P signal and P continue, Q wait until P leaves the monitor, or Q wait for another condition
## single resource
monitor ResourceAllocator { boolean busy; condition x; void acquire(int time) { if (busy) x.wait(time); busy = TRUE; } void release() { busy = FALSE; x.signal(); }
initialization code() { busy = FALSE; } } ```
Prior to kernel Version 2.6, disables interrupts to implement short critical sections
On single-cpu system, spinlocks replaced by enabling and disabling kernel preemption
itrusion detection system mimicking virtual machine mariadb for transaction postgre for spatial
plotly
coreos ggn fleet etcd go-nerve health-check go-synapse