Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++11 register cache thread safety

in volatile: The Multithreaded Programmer's Best Friend, Andrei Alexandrescu gives this example:

class Gadget
{
public:
    void Wait()
    {
        while (!flag_)
        {
            Sleep(1000); // sleeps for 1000 milliseconds
        }
    }
    void Wakeup()
    {
        flag_ = true;
    }
    ...
private:
    bool flag_;
};

he states,

... the compiler concludes that it can cache flag_ in a register ... it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever. This is because the change of flag_ will not be reflected in the register that caches flag_.

then he offers a solution:

If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable.

now, other people mentioned on stackoverflow and elsewhere that volatile keyword doesn't really offer any thread-safety guarantees, and i should use std::atomic or mutex synchronization instead, which i do agree.

however, going the std::atomic route for example, which internally uses memory fences read_acquire and write_release (Acquire and Release Semantics), i don't see how it actually fixes the register-cache problem in particular.

in case of x86 for example, every load on x86/64 already implies acquire semantics and every store implies release semantics such that compiled code under x86 doesn't emit any actual memory barriers at all. (The Purpose of memory_order_consume in C++11)

g = Guard.load(memory_order_acquire);
if (g != 0)
    p = Payload;

enter image description here

On Intel x86-64, the Clang compiler generates compact machine code for this example – one machine instruction per line of C++ source code. This family of processors features a strong memory model, so the compiler doesn’t need to emit special memory barrier instructions to implement the read-acquire.

so.... just assuming x86 arch for now, how does std::atomic solve the cache in registry problem? w/ no memory barrier instructions for read-acquire in compiled code, it seems to be the same as the compiled code for just regular read.

like image 936
jason na Avatar asked Sep 10 '15 07:09

jason na


People also ask

Is Cache thread safe?

The short answer is yes they will. If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.

Is .NET MemoryCache thread safe?

NET Framework is thread-safe (according to the documentation).

Is New delete thread safe?

The C++ new and delete operators are thread safe, but this means that a thread may have to wait for a lock on these operations. Once memory is obtained for a thread, the thread_alloc memory allocator keeps that memory available for the thread so that it can be re-used without waiting for a lock.

Is new thread safe?

You will have to look very hard to find a platform that supports threads but doesn't have a thread safe new . In fact, the thread safety of new (and malloc ) is one of the reasons it's so slow.


Video Answer


2 Answers

Did you notice that there was no load from just a register in your code? There was an explicit memory load from _Guard. So it did in fact prevent caching in a register.

Now how it does this is up to the specific platform's implementation of std::atomic, but it must do this.

And, by the way, Alexandrescu's reasoning is completely wrong for modern platforms. While it's true that volatile prevents the compiler from caching in a register, it doesn't prevent similar caching being done by the CPU or by hardware. On some platforms, it might happen to be adequate, but there is absolutely no reason to write gratuitously non-portable code that might break on a future CPU, compiler, library, or platform when a fully-portable alternative is readily available.

like image 63
David Schwartz Avatar answered Oct 10 '22 21:10

David Schwartz


volatile is not necessary for any "sane" implementation when the Gadget example is changed to use std::atomic<bool>. The reason for this is not that the standard forbids the use of registers, instead (§29.3/13 in n3690):

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

Of course, what constitutes "reasonable" is open to interpretation, and it's "should", not "shall", so an implementation might ignore the requirement without violating the letter of the standard. Typical implementations do not cache the results of atomic loads, nor (much) delay issuing an atomic store to the CPU, and thus leave the decision largely to the hardware. If you would like to enforce this behavior, you should use volatile std::atomic<bool> instead. In both cases, however, if another thread sets the flag, the Wait() should be finite, but if your compiler and/or CPU are so willing, can still take much longer than you would like.

Also note that a memory fence does not guarantee that a store becomes visible to another thread immediately nor any sooner than it otherwise would. So even if the compiler added fence instructions to Gadget's methods, they wouldn't help at all. Fences are used to guarantee consistency, not to increase performance.

like image 23
Arne Vogel Avatar answered Oct 10 '22 19:10

Arne Vogel