Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A tested implementation of Peterson Lock algorithm?

Does anyone know of a good/correct implementation of Peterson's Lock algorithm in C? I can't seem to find this. Thanks.

like image 929
Dervin Thunk Avatar asked Jul 21 '12 00:07

Dervin Thunk


People also ask

How does Petersons algorithm work?

Peterson's Algorithm is used to synchronize two processes. It uses two variables, a bool array flag of size 2 and an int variable turn to accomplish it. In the solution i represents the Consumer and j represents the Producer. Initially the flags are false.

What is Peterson's solution discuss the critical section problem using Peterson's solution?

Peterson's solution is a classic solution to the critical section problem. The critical section problem ensures that no two process change or modify the value of a resource at the same time. For example, let int a=5 and there are two processes p1 and p2 that can modify the value of a.

Can Peterson's algorithm result in deadlock?

However, in Peterson solution, A deadlock can never happen because the process which first sets the turn variable will enter in the critical section for sure. Therefore, if a process is preempted after executing line number 4 of the entry section then it will definitely get into the critical section in its next chance.

Is Peterson lock fair?

You are right, the Peterson algorithm for two threads is fair (aka first come first served).


2 Answers

Peterson's algorithm cannot be implemented correctly in C99, as explained in who ordered memory fences on an x86.

Peterson's algorithm is as follows:

LOCK:

interested[id] = 1                interested[other] = 1
turn = other                      turn = id

while turn == other               while turn == id
  and interested[other] == 1        and interested[id] == 1


UNLOCK:

interested[id] = 0                interested[other] = 0

There are some hidden assumptions here. To begin with, each thread must note its interest in acquiring the lock before giving away its turn. Giving away the turn must make visible to the other thread that we are interested in acquiring the lock.

Also, as in every lock, memory accesses in the critical section cannot be hoisted past the lock() call, nor sunk past the unlock(). I.e.: lock() must have at least acquire semantics, and unlock() must have at least release semantics.

In C11, the simplest way to achieve this would be to use a sequentially consistent memory order, which makes the code run as if it were a simple interleaving of threads running in program order (WARNING: totally untested code, but it's similar to an example in Dmitriy V'jukov's Relacy Race Detector):

lock(int id)
{
    atomic_store(&interested[id], 1);
    atomic_store(&turn, 1 - id);

    while (atomic_load(&turn) == 1 - id
           && atomic_load(&interested[1 - id]) == 1);
}

unlock(int id)
{
    atomic_store(&interested[id], 0);
}

This ensures that the compiler doesn't make optimizations that break the algorithm (by hoisting/sinking loads/stores across atomic operations), and emits the appropriate CPU instructions to ensure the CPU also doesn't break the algorithm. The default memory model for C11/C++11 atomic operations that don't explicitely select a memory model is the sequentially consistent memory model.

C11/C++11 also support weaker memory models, allowing as much optimization as possible. The following is a translation to C11 of the translation to C++11 by Anthony Williams of an algorithm originally by Dmitriy V'jukov in the syntax of his own Relacy Race Detector [petersons_lock_with_C++0x_atomics] [the-inscrutable-c-memory-model]. If this algorithm is incorrect it's my fault (WARNING: also untested code, but based on good code from Dmitriy V'jukov and Anthony Williams):

lock(int id)
{
    atomic_store_explicit(&interested[id], 1, memory_order_relaxed);
    atomic_exchange_explicit(&turn, 1 - id, memory_order_acq_rel);

    while (atomic_load_explicit(&interested[1 - id], memory_order_acquire) == 1
           && atomic_load_explicit(&turn, memory_order_relaxed) == 1 - id);
}

unlock(int id)
{
    atomic_store_explicit(&interested[id], 0, memory_order_release);
}

Notice the exchange with acquire and release semantics. An exchange is an atomic RMW operation. Atomic RMW operations always read the last value stored before the write in the RMW operation. Also, an acquire on an atomic object that reads a write from a release on that same atomic object (or any later write on that object from the thread that performed the release or any later write from any atomic RMW operation) creates a synchronizes-with relation between the release and the acquire.

So, this operation is a synchronization point between the threads, there is always a synchronizes-with relationship between the exchange in one thread and the last exchange performed by any thread (or the initialization of turn, for the very first exchange).

So we have a sequenced-before relationship between the store to interested[id] and the exchange from/to turn, a synchronizes-with relationship between two consecutive exchanges from/to turn, and a sequenced-before relationship between the exchange from/to turn and the load of interested[1 - id]. This amounts to a happens-before relationship between accesses to interested[x] in different threads, with turn providing the synchronization between threads. This forces all the ordering needed to make the algorithm work.

So how were these things done before C11? It involved using compiler and CPU-specific magic. As an example, let's see the pretty strongly-ordered x86. IIRC, all x86 loads have acquire semantics, and all stores have release semantics (save non-temporal moves, in SSE, used precisely to achive higher performance at the cost of ocassionally needing to issue CPU fences to achieve coherence between CPUs). But this is not enough for Peterson's algorithm, as Bartosz Milewsky explains at who-ordered-memory-fences-on-an-x86 , for Peterson's algorithm to work we need to establish an ordering between accesses to turn and interested, failing to do that may result in seeing loads from interested[1 - id] before writes to interested[id], which is a bad thing.

So a way to do that in GCC/x86 would be (WARNING: although I tested something similar to the following, actually a modified version of the code at wrong-implementation-of-petersons-algorithm , testing is nowhere near assuring correctness of multithreaded code):

lock(int id)
{
    interested[id] = 1;
    turn = 1 - id;
    __asm__ __volatile__("mfence");

    do {
        __asm__ __volatile__("":::"memory");
    } while (turn == 1 - id
           && interested[1 - id] == 1);
}

unlock(int id)
{
   interested[id] = 0;
}

The MFENCE prevents stores and loads to different memory addresses from being reordered. Otherwise the write to interested[id] could be queued in the store buffer while the load of interested[1 - id] proceeds. On many current microarchitectures a SFENCE may be enough, since it may be implemented as a store buffer drain, but IIUC SFENCE doesn't need to be implemented that way, and may simply prevent reordering between stores. So SFENCE may not be enough everywhere, and we need a full MFENCE.

The compiler barrier (__asm__ __volatile__("":::"memory")) prevents the compiler from deciding that it already knows the value of turn. We're telling the compiler that we've clobbered memory, so all values cached in registers must be reloaded from memory.

P.S: I feel this needs a closing paragraph, but my brain is drained.

like image 63
ninjalj Avatar answered Oct 22 '22 15:10

ninjalj


I won't make any assertions about how good or correct the implementation is, but it was tested (briefly). This is a straight translation of the algorithm described on wikipedia.

struct petersonslock_t {
    volatile unsigned flag[2];
    volatile unsigned turn;
};
typedef struct petersonslock_t petersonslock_t;

petersonslock_t petersonslock () {
    petersonslock_t l = { { 0U, 0U }, ~0U };
    return l;
}

void petersonslock_lock (petersonslock_t *l, int p) {
    assert(p == 0 || p == 1);
    l->flag[p] = 1;
    l->turn = !p;
    while (l->flag[!p] && (l->turn == !p)) {}
};

void petersonslock_unlock (petersonslock_t *l, int p) {
    assert(p == 0 || p == 1);
    l->flag[p] = 0;
};

Greg points out that on an SMP architecture with slightly relaxed memory coherency (such as x86), although the loads to the same memory location are in order, loads to different locations on one processor may appear out of order to the other processor.

Jens Gustedt and ninjalj recommend modifying the original algorithm to use the atomic_flag type. This means setting the flags and turns would use the atomic_flag_test_and_set and clearing them would use atomic_flag_clear from C11. Alternatively, a memory barrier could be imposed between updates to flag.

Edit: I originally attempted to correct for this by writing to the same memory location for all the states. ninjalj pointed out that the bitwise operations turned the state operations into RMW rather than load and stores of the original algorithm. So, atomic bitwise operations are required. C11 provides such operators, as does GCC with built-ins. The algorithm below uses GCC built-ins, but wrapped in macros so that it can easily be changed to some other implementation. However, modifying the original algorithm above is the preferred solution.

struct petersonslock_t {
    volatile unsigned state;
};
typedef struct petersonslock_t petersonslock_t;

#define ATOMIC_OR(x,v)   __sync_or_and_fetch(&x, v)
#define ATOMIC_AND(x,v)  __sync_and_and_fetch(&x, v)

petersonslock_t petersonslock () {
    petersonslock_t l = { 0x000000U };
    return l;
}

void petersonslock_lock (petersonslock_t *l, int p) {
    assert(p == 0 || p == 1);
    unsigned mask = (p == 0) ? 0xFF0000 : 0x00FF00;
    ATOMIC_OR(l->state, (p == 0) ? 0x000100 : 0x010000);
    (p == 0) ? ATOMIC_OR(l->state, 0x000001) : ATOMIC_AND(l->state, 0xFFFF00);
    while ((l->state & mask) && (l->state & 0x0000FF) == !p) {}
};

void petersonslock_unlock (petersonslock_t *l, int p) {
    assert(p == 0 || p == 1);
    ATOMIC_AND(l->state, (p == 0) ? 0xFF00FF : 0x00FFFF);
};
like image 23
jxh Avatar answered Oct 22 '22 15:10

jxh