Assume that I have two processes that both share a memory block using shm_open
and mmap
and there exists a shared synchronization primitive - let's say a semaphore - that ensures exclusive access to the memory. I.e. no race conditions.
My understanding is that the pointer returned from mmap
must still be marked as volatile to prevent cached reads.
Now, how does one write e.g. a std::uint64_t
into any aligned position in the memory?
Naturally, I would simply use std::memcpy
but it does not work with pointers to volatile memory.
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization, for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(nullptr);
// Store byte-by-byte
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
for(std::size_t i=0;i<sizeof(value);++i)
ptr[i]=src[i];
Godbolt.
I strongly believe this solution is correct but even with -O3
, there are 8 1-byte transfers. That is really not optimal.
Since I know no one is going to change the memory while I have it locked, maybe the volatile is unnecessary after all?
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
//Obscure enough?
auto* real_ptr = reinterpret_cast<unsigned char*>(reinterpret_cast<std::uintptr_t>(ptr));
std::memcpy(real_ptr,src,sizeof(value));
Godbolt.
But this does not seem to work, compiler sees through the cast and does nothing. Clang generates ud2
instruction, not sure why, is there UB in my code? Apart from value
initialization.
This one comes from this answer. But I think it does break strict aliasing rule, does it not?
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
volatile std::uint64_t* dest = reinterpret_cast<volatile std::uint64_t*>(ptr);
*dest=value;
Godbolt.
Gcc actually does what I want - a simple one instruction to copy 64bit value. But it is useless if it is UB.
One way how I could go about fixing it is to really create std::uint64_t
object at that place. But, apparently placement new does not work with volatile
pointers either.
memcpy
do the right thing?volatile
at all, should I do that too? Is mmap
ed pointer treated differently already? How?Thanks for any suggestions.
EDIT:
Both processes run on the same system. Also please assume the values can be copied byte-by-byte, not talking about complex virtual classes storing pointers to somewhere. All Integers and no floats would be just fine.
My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.
Your understanding is wrong. Don't use volatile
for controlling memory visibility - that isn't what it is for. It will either be unnecessarily expensive, or insufficiently strict, or both.
Consider, for example, the GCC documentation on volatile, which says:
Accesses to non-volatile objects are not ordered with respect to volatile accesses. You cannot use a volatile object as a memory barrier to order a sequence of writes to non-volatile memory
If you just want to avoid tearing, cacheing, and reordering - use <atomic>
instead. For example, if you have an existing shared uint64_t
(and it is correctly aligned), just access it via a std::atomic_ref<uint64_t>
. You can use acquire, release, or CAS directly with this.
If you need normal synchronization, then your existing semaphore will be fine. As below, it already supplies whatever fences are necessary, and prevents reordering across the wait/post calls. It doesn't prevent reordering or other optimizations between them, but that's generally fine.
As for
Any examples(mostly C) do not use volatile at all, should I do that too? Is mmaped pointer treated differently already? How?
the answer is that whatever synchronization is used is required to also apply appropriate fences.
POSIX lists these functions as "synchronizing memory", which means they must both emit any required memory fences, and prevent inappropriate compiler reordering.
So, for example, your implementation must avoid moving memory accesses across pthread_mutex_*lock()
or sem_wait()
/sem_post()
calls in order to be POSIX-compliant, even where it would otherwise be legal C or C++.
When you use C++'s built-in thread or atomic support, the correct semantics are part of the language standard instead of a platform extension (but shared memory isn't).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With