Is the example in the membarrier man page pointless in x86?

Question

Maybe it's just me, but the example in the man 2 page for membarrier seems pointless.

Basically, membarrier() is an asynchronous memory barrier that, given two coordinating pieces of code (let's call then fast path and slow path) allows you to move all the hardware cost of the the barrier to the slow path, and leave the fast path only with a compiler barrier¹. There are a few different ways to accomplish the membarrier behavior, such as sending an IPI to each involved processor or wait for the code running on each processor to be de-scheduled - but the exact implementation details aren't important here.

Now, here's the example transformation given in the man page:

Original Code

static volatile int a, b;

static void
fast_path(void)
{
   int read_a, read_b;

   read_b = b;
   asm volatile ("mfence" : : : "memory");
   read_a = a;

   /* read_b == 1 implies read_a == 1. */

   if (read_b == 1 && read_a == 0)
       abort();
}

static void
slow_path(void)
{
   a = 1;
   asm volatile ("mfence" : : : "memory");
   b = 1;
}

Transformed Code

(some syscall and init boilerplate omitted)

static volatile int a, b;

static void
fast_path(void)
{
   int read_a, read_b;

   read_b = b;
   asm volatile ("" : : : "memory");
   read_a = a;

   /* read_b == 1 implies read_a == 1. */

   if (read_b == 1 && read_a == 0)
       abort();
}

static void
slow_path(void)
{
   a = 1;
   membarrier(MEMBARRIER_CMD_SHARED, 0);
   b = 1;
}

Here the slow_path is doing two writes (a, then b) separated by a barrier, and the fast_path is doing two reads (b, then a) also separated by a barrier.

The x86 memory model, however, doesn't permit load-load or store-store reordering! So as far as I can tell, membarrier() isn't needed at all in this scenarios and also the mfence wasn't needed in the original code. It seems that simple compiler barriers would have sufficed in both places².

An example that actually makes sense, IMO, should have a store followed by a load, separated by a barrier in the fast path.

Am I missing something?

¹ A compiler barrier prevents movement of loads or stores across it by the compiler (and depending on the implementation may force some register values to memory), but doesn't emit any type of atomic operation or memory fence, so avoid the often order-of-magnitude slowdown inherent in those instructions.

² Of course on weaker platforms, where load-load reordering can occur, the example might make sense, but the example is explicitly x86 and membarrier() is only implemented on x86.

Iwillnotexist Idonotexist · Accepted Answer

You are correct. On x86, this particular use of membarrier() is completely useless. In fact, this exact example is the very first one given in the Intel SDM to illustrate x86 memory ordering rules:

Intel SDM Vol. 3 §8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations

Mathieu Desnoyers · Answer

I'm submitting a fix to this manpage to use a Dekker example instead. See https://lkml.org/lkml/2017/9/18/779

By the way, the membarrier system call is not x86-specific, but rather implement on most Linux architectures now.

Thanks for the feedback!

Mathieu

Is the example in the membarrier man page pointless in x86?

Tags:

linux

x86

memory-model

memory-barriers

Original Code

Transformed Code

BeeOnRope

2 Answers

Iwillnotexist Idonotexist

Mathieu Desnoyers

Recent Activity

Donate For Us

Is the example in the membarrier man page pointless in x86?

Tags:

linux

x86

memory-model

memory-barriers

Original Code

Transformed Code

BeeOnRope

2 Answers

Iwillnotexist Idonotexist

Mathieu Desnoyers

Related questions

Recent Activity

Donate For Us