I have a shared memory between multiple processes that interpets the memory in a certain way. Ex:
DataBlock {
int counter;
double value1;
double ... }
What I want is for the counter to be updated/incremented atomically. And for a memory release to happen on that address. If I werent using shared memory, for example, it would be something like
std::atomic<int> counter;
atomic_store(counter, newvalue, std::memory_order_release); // perform release operation on the affected memory location making the write visible to other threads
How do I achieve this for a random memory location (interpreted to be DataBlock counter >above). I can guarantee the address is aligned as required by the architecture (x86 linux)
- Make the update atomic - how? (i.e. atomicupdate(addr, newvalue))
- Memory syncing for multicore - (i.e. memorysync(addr)) - only way I can see is using the std::atomic_thread_fence(std::memory_order_release) - but this will "establish memory synchronization ordering of ALL atomic and relaxed atomic stores" - thats overkill for me - I just want the counter location to be synchronized. Appreciate any thoughts.
An operation acting on shared memory is atomic if it completes in a single step relative to other threads. When an atomic store is performed on a shared variable, no other thread can observe the modification half-complete.
One critical mechanism to support general- purpose computing on GPUs is atomic operations. Atomic operations allow different threads to safely manipulate shared variables, and in turn, enable synchronization and work sharing between threads on the GPU.
Atomic operations in concurrent programming are program operations that run completely independently of any other processes. Atomic operations are used in many modern operating systems and parallel processing systems.
Atomic instructions bypass the store buffer or at least they act as if they do - they likely actually use the store buffer, but they flush it and the instruction pipeline before the load and wait for it to drain after, and have a lock on the cacheline that they take as part o the load, and release as part of the store ...
I can't answer with authority here, but I can give related information that might help.
Mutexes can be created in shared memory and/or created to be cross-process. Pthread has a special creation flag, I can't remember if that uses shared memory, or you then share a handle. The linux "futex" can use shared memory directly (note the user address may differ, but the underlying real address should be the same)
Hardware atomics work on memory and not process variables. That is, your chip won't care which programs are modifying the variables, the lowest level atomics will thus naturally be cross-process. The same applies to fences.
C++11 fails to specify cross-process atomics. However, if they are lock-free (check the flag) it is hard to see how a compiler could implement them such that cross-process wouldn't work. But you'd be placing a lot of faith in your tool-chain and final platform.
CPU dependency guarantees also track real memory addresses, so as long as your program would be correct in a threaded form it should also be correct in its multi-process form (with respect to visibility).
Kerrek is correct, the abstract machine doesn't really mention multiple processes. However, its synchronization details are written in a way such that they'd equally apply to inter-process as they do to multi-thread. This relates to #3: it'd be hard for a compiler to screw this up.
Short answer, there is no standards compliant way to do this. However, leaning on the way the standard defines mutli-threads there are a lot of assumptions you can make for a quality compiler.
The biggest question is whether an atomic can simply be allocated in shared memory (placement new) and work. Obviously this would only work if it is a true hardware atomic. My guess however is that with a quality compiler/libary the C++ atomics should work find in shared memory.
Have fun verifying behaviour. :)
Since you're on Linux, you can use the gcc
atomic built-in __sync_fetch_and_add()
on the address for counter
... according to the gcc-documentation on atomic built-ins, this will also implement a full memory fence, not a release operation, but since you actually want a read-modify-write operation rather than simply a load (i.e., incrementing a counter is not just a load, but you have to read, then modify, and finally write-back the value), the full-memory fence is going to be a better choice to enforce the correct memory ordering for this operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With