Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Atomicity, Volatility and Thread Safety in Windows

It's my understanding of atomicity that it's used to make sure a value will be read/written in whole rather than in parts. For example, a 64-bit value that is really two 32-bit DWORDs (assume x86 here) must be atomic when shared between threads so that both DWORDs are read/written at the same time. That way one thread can't read half variable that's not updated. How do you guarantee atomicity?

Furthermore it's my understanding that volatility does not guarantee thread safety at all. Is that true?

I've seen it implied many places that simply being atomic/volatile is thread-safe. I don't see how that is. Won't I need a memory barrier as well to ensure that any values, atomic or otherwise, are read/written before they can actually be guaranteed to be read/written in the other thread?

So for example let's say I create a thread suspended, do some calculations to change some values to a struct available to the thread and then resume, for example:

HANDLE hThread = CreateThread(NULL, 0, thread_entry, (void *)&data, CREATE_SUSPENDED, NULL);
data->val64 = SomeCalculation();
ResumeThread(hThread);

I suppose this would depend on any memory barriers in ResumeThread? Should I do an interlocked exchange for val64? What if the thread were running, how does that change things?

I'm sure I'm asking a lot here but basically what I'm trying to figure out is what I asked in the title: a good explanation for atomicity, volatility and thread safety in Windows. Thanks

like image 791
loop Avatar asked Jan 24 '15 07:01

loop


2 Answers

it's used to make sure a value will be read/written in whole

That's just a small part of atomicity. At its core it means "uninterruptible", an instruction on a processor whose side-effects cannot be interleaved with another instruction. By design, a memory update is atomic when it can be executed with a single memory-bus cycle. Which requires the address of the memory location to be aligned so that a single cycle can update it. An unaligned access requires extra work, part of the bytes written by one cycle and part by another. Now it is not uninterruptible anymore.

Getting aligned updates is pretty easy, it is a guarantee provided by the compiler. Or, more broadly, by the memory model implemented by the compiler. Which simply chooses memory addresses that are aligned, sometimes intentionally leaving unused gaps of a few bytes to get the next variable aligned. An update to a variable that's larger than the native word size of the processor can never be atomic.

But much more important are the kind of processor instructions you need to make threading work. Every processor implements a variant of the CAS instruction, compare-and-swap. It is the core atomic instruction you need to implement synchronization. Higher level synchronization primitives, like monitors (aka condition variables), mutexes, signals, critical sections and semaphores are all built on top of that core instruction.

That's the minimum, a processor usually provide extra ones to make simple operations atomic. Like incrementing a variable, at its core an interruptible operation since it requires a read-modify-write operation. Having a need for it be atomic is very common, most any C++ program relies on it for example to implement reference counting.

volatility does not guarantee thread safety at all

It doesn't. It is an attribute that dates from much easier times, back when machines only had a single processor core. It only affects code generation, in particular the way a code optimizer tries to eliminate memory accesses and use a copy of the value in a processor register instead. Makes a big, big difference to code execution speed, reading a value from a register is easily 3 times faster than having to read it from memory.

Applying volatile ensures that the code optimizer does not consider the value in the register to be accurate and forces it to read memory again. It truly only matters on the kind of memory values that are not stable by themselves, devices that expose their registers through memory-mapped I/O. It has been abused heavily since that core meaning to try to put semantics on top of processors with a weak memory model, Itanium being the most egregious example. What you get with volatile today is strongly dependent on the specific compiler and runtime you use. Never use it for thread-safety, always use a synchronization primitive instead.

simply being atomic/volatile is thread-safe

Programming would be much simpler if that was true. Atomic operations only cover the very simple operations, a real program often needs to keep an entire object thread-safe. Having all its members updated atomically and never expose a view of the object that is partially updated. Something as simple as iterating a list is a core example, you can't have another thread modifying the list while you are looking at its elements. That's when you need to reach for the higher-level synchronization primitives, the kind that can block code until it is safe to proceed.

Real programs often suffer from this synchronization need and exhibit Amdahls' law behavior. In other words, adding an extra thread does not actually make the program faster. Sometimes actually making it slower. Whomever finds a better mouse-trap for this is guaranteed a Nobel, we're still waiting.

like image 87
Hans Passant Avatar answered Sep 21 '22 23:09

Hans Passant


In general, C and C++ don't give any guarantees about how reading or writing a 'volatile' object behaves in multithreaded programs. (The 'new' C++11 probably does since it now includes threads as part of the standard, but tradiationally threads have not been part of standard C or C++.) Using volatile and making assumptions about atomicity and cache-coherence in code that's meant to be portable is a problem. It's a crap-shoot as to whether a particular compiler and platform will treat accesses to 'volatile' objects in a thread-safe way.

The general rule is: 'volatile' is not enough to ensure thread safe access. You should use some platform-provided mechanism (usually some functions or synchronisation objects) to access thread-shared values safely.

Now, specifically on Windows, specifically with the VC++ 2005+ compiler, and specifically on x86 and x64 systems, accessing a primitive object (like an int) can be made thread-safe if:

  1. On 64- and 32-bit Windows, the object has to be a 32-bit type, and it has to be 32-bit aligned.
  2. On 64-bit Windows, the object may also be a 64-bit type, and it has to be 64-bit aligned.
  3. It must be declared volatile.

If those are true, then accesses to the object will be volatile, atomic and be surrounded by instructions that ensure cache-coherency. The size and alignment conditions must be met so that the compiler makes code that performs atomic operations when accessing the object. Declaring the object volatile ensures that the compiler doesn't make code optimisations related to caching previous values it may have read into a register and ensures that code generated includes appropriate memory barrier instructions when it's accessed.

Even so, you're probably still better off using something like the Interlocked* functions for accessing small things, and bog standard synchronisation objects like Mutexes or CriticalSections for larger objects and data structures. Ideally, get libraries for and use data structures that already include appropriate locks. Let your libraries & OS do the hard work as much as possible!

In your example, I expect you do need to use a thread-safe access to update val64 whether the thread is started yet or not.

If the thread was already running, then you would definitely need some kind of thread-safe write to val64, either using InterchangeExchange64 or similar, or by acquiring and releasing some kind of synchronisation object which will perform appropriate memory barrier instructions. Similarly, the thread would need to use a thread-safe accessor to read it as well.

In the case where the thread hasn't been resumed yet, it's a bit less clear. It's possible that ResumeThread might use or act like a synchronisation function and do the memory barrier operations, but the documentation doesn't specify that it does, so it is better to assume that it doesn't.

References:

On atomicity of 32- and 64- bit aligned types... https://msdn.microsoft.com/en-us/library/windows/desktop/ms684122%28v=vs.85%29.aspx

On 'volatile' including memory fences... https://msdn.microsoft.com/en-us/library/windows/desktop/ms686355%28v=vs.85%29.aspx

like image 21
Wuggy Avatar answered Sep 22 '22 23:09

Wuggy