Syncronization

Take, for example, inter-thread communication: each thread operates sequentially, and everything is fine until threads want to share data.

We can implement this with a producer/consumer queue - but what if the buffer is full on produce or empty on consume?

One thing we can do is have the producer block until the queue is not full and consumer block until the queue is not empty - but there’s a problem.

Race Conditions

  • cooperating threads share memory
  • both may r/w the shared memory
  • but, we can’t guarantee that read followed by write is atomic

So, we need to ensure certain operations are atomic that are critical (critical regions)

  • Critical regions provide mutual exclusion and help fix race conditions
  • 4 conditions must hold:
    • no 2 threads may simultaneously be in a critical region
    • no assumptions may be made about speeds or number of cpus
    • no thread running outside its critical region may block another thread
    • a thread may not wait forever to enter its critical region

Spin Lock

One approach is to use a single shared variable to keep track of whose turn it is, and busy wait until it’s a given thread’s turn.

This is called a spin lock - the waiting thread spins in a tight loop reading the turn variable.

This avoids race conditions, but it doesn’t satisfy criterion 3 for critical regions: each thread has to alternate, instead of the next available thread that needs to run the critical region doing it.

Busy Waiting

#define FALSE 0
#define TRUE 1
#define N 2
uint32_t turn;
uint32_t interested[N];
void enter_region(uint32_t thread) {
    uint32_t other = 1 - thread;
    interested[thread] = TRUE;
    turn = thread;                  // my turn to wait
    // wait while the other thread runs
    while (turn == thread && interested[other] == TRUE) {}
}

void leave_region(uint32_t thread) {
    interested[thread] = FALSE;
}

Bakery Algorithm

Multiple-producer syncronization

Notation:

  • <<< is lexographical order on (ticket_number, thread_id)
  • (a, b) <<< (c, d) if (a < c) or ((a == c) and (b < d))
  • \(\max(a_0, a_1, ..., a_{n-1})\) is a number k s.t. \(k \geq a_i\) for all i
uint32_t n;             // number of threads
uint32_t choosing[n];   // initialized to 0
uint32_t number[n];     // initialized to 0

while (1) {
    // get a ticket number higher than anyone else's, using single writer property
    choosing[i] = 1;    // where i is the current thread #
    number[i] = max(number[0]..number[n-1]) + 1
    choosing[i] = 0;
    for (uint32_t j = 0; j < n; j++) {
        // wait while j is choosing a number
        // important in case max() chose the same number, but i ran before j could write to mem
        while (choosing[j]) {}
        // wait while j wants to enter and j <<< i
        while ((number[j] != 0) && ((number[j] < number[i]) || ((number[j] == number[i]) && (j < i)))) {}
    }
    // critical section goes here
    number[i] = 0;
    // rest of code
}

Hardware Sync

Software methods discussed above work, but…

  • may be complex
  • require busy waiting, wasting CPU time
  • several hardware methods (typically in CPU ISA)
    • test and set: read and write in a single instruction
    • atomic swap: swap reg and mem in 1 instruction
  • software only solution: turn off interrupts, so thread won’t be switched out unless it asks to be suspended
  • x86 supports compare-and-swap: most common solution

Variable Lock

  • single shared variable lock
  • 2 versions: test and set, swap
  • works for any number of threads
  • still requires busy waiting, but simpler
  • possible problem with requirements: non-concurrent code can lead to unbounded waiting

TestAndSet:

int32_t lock = 0;
while (1) {
    while (TestAndSet(lock)) {}  // test and set writes 1 to lock and returns what was there in 1 operation
    // critical section
    lock = 0;
    // rest of code
}

swap:

int32_t lock = 0;
while (1) {
    while (Swap(lock, 1)) {}  // swaps 1 with the lock and returns what was there
    // critical section
    lock = 0;
    // rest of code
}

Problems

  • these waste CPU time
  • priority inversion (higher priority thread waits for lower)
    • e.g. if P1 is waiting on P3’s lock, but P2 is running since it’s higher priority

Semaphores

Semaphores are a syncronization mechanism that doesn’t require busy waiting during the entire critical section

  • Semaphore S accessed by 2 atomic operations
    • Down(S): while (S <= 0) {}; S = S-1; aka P()
      • blocks until the value of the semaphore reaches 1 or higher, and decrement it
    • Up(S): S = S+1; aka V()
      • increments the semaphore by 1
  • modify implementation to eliminate busy wait from Down()

Implementation

  • Assume 2 existing operations:
    • Sleep(): suspends current thread
    • Wakeup(T): allows thread T to continue execution
  • Semaphore is a class
    • tracks value of semaphore
    • keeps a list of threads waiting for the semaphore
  • operations still atomic
class Semaphore {
    int32_t value;
    ThreadList tl;
    void down();
    void up();
}

Semaphore::down() {
    value -= 1;
    if (value < 0) {
        // add this thread to tl
        Sleep();
    }
}

Semaphore::up() {
    Thread *T;
    value += 1;
    if (value <= 0) {
        // remove a thread T from tl
        Wakeup(T);
    }
}

General Sync

  • e.g. We want to execute B in T1 only after A executes in T0
    • use a semaphore initialized to 0
    • use up() to notify T1 at the appropriate time
  • this is called a rendezvous
// flag initialized to 0
Semaphore flag;

/* --- thread T0 --- */
...
// execute code for A
flag.up();

/* --- thread T1 --- */
...
flag.down();
// execute code for B

Types

  • counting semaphores
    • value can range over an unrestricted range
  • binary semaphore
    • only 2 values available
    • value 1 means semaphore is available
    • value 0 means a thread has acquired the semaphore

Monitor

  • another kind of high-level sync primitive
    • one monitor has multiple entry points
    • only 1 thread may be in the monitor at any time
    • enforces mutual exclusion: better at avoiding programming errors
  • provided by high-level language
    • variables belonging to monitor are protected from simultaneous access
    • procedures in monitor are guaranteed to have mutual exclusion
  • monitor implementation
    • language/compiler handles implementation
    • can be implemented using semaphores

Example

  • not real code!
  • provides the following features:
    • variables foo, bar, and arr are accessible only by func1 and func2
    • only 1 thread can be executing func1 or func2 at a time
monitor mon {
    int32_t foo;
    int32_t bar;
    double arr[100];
    void func1(...) {}
    void func2(...) {}
    void mon() {}  // init code
}
  • Problem: how can a thread wait inside a monitor?
    • e.g. if a thread in monitor A wants to enter monitor B, monitor A is now blocked
    • solution: use a condition variable
  • condition variables support 2 operations
    • Wait(): suspend this thread until signaled
    • Signal(): wait up exactly one thread waiting on this condition variable
      • if no thread is waiting, no effect
      • signals on condition variables aren’t “saved up”
  • condition variables are only usable within monitors
_images/ex11.png
  • monitors require native language support
  • instead, many languages provide monitor support using special data types and procedures
  • e.g. lock
    • acquiring a lock is like entering a monitor
    • releasing a lock is like leaving a monitor
  • condition variable usage
    • each condition variable is associated with exactly one lock
    • lock must be held to use condition variable
    • waiting on a condition variable releases the lock implicitly
    • returning from Wait() on a condition variable reacquires the lock
_images/ex21.png _images/ex3.png _images/ex4.png

Barriers

  • used for syncronizing multiple threads
  • threads wait at a “barrier” until all in the group arrive
  • after all have arrived, all threads can proceed
  • may be implemented using locks and condition variables
Barrier b; // contains 2 semaphores
b.bsem.value = 0; // for the barrier
b.mutex.value = 1; // for mutual exclusion
b.waiting = 0;
b.maxthread = n;

void HitBarrier(Barrier *b) {
    SemDown(&b->mutex);
    if (++b->waiting >= b->maxthread) {  // if everyone is waiting now, loop and release everyone
        while (--b->waiting > 0) {
            SemUp(&b->bsem);
        }
        SemUp(&b->mutex);
    } else {  // otherwise I start waiting
        SemUp(&b->mutex);
        SemDown(&b->bsem);
    }
}