Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write multi-byte values to shared memory in C++14?

Assume that I have two processes that both share a memory block using shm_open and mmap and there exists a shared synchronization primitive - let's say a semaphore - that ensures exclusive access to the memory. I.e. no race conditions.

My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.

Now, how does one write e.g. a std::uint64_t into any aligned position in the memory?

Naturally, I would simply use std::memcpy but it does not work with pointers to volatile memory.

First attempt

// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;

// Value to store, initialize "randomly" to prevent compiler
// optimization, for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(nullptr);

// Store byte-by-byte
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
for(std::size_t i=0;i<sizeof(value);++i)
    ptr[i]=src[i];

Godbolt.

I strongly believe this solution is correct but even with -O3, there are 8 1-byte transfers. That is really not optimal.

Second Attempt

Since I know no one is going to change the memory while I have it locked, maybe the volatile is unnecessary after all?

// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;

// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);

//Obscure enough?
auto* real_ptr = reinterpret_cast<unsigned char*>(reinterpret_cast<std::uintptr_t>(ptr));

std::memcpy(real_ptr,src,sizeof(value));

Godbolt.

But this does not seem to work, compiler sees through the cast and does nothing. Clang generates ud2 instruction, not sure why, is there UB in my code? Apart from value initialization.

Third attempt

This one comes from this answer. But I think it does break strict aliasing rule, does it not?

// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;

// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);

volatile std::uint64_t* dest = reinterpret_cast<volatile std::uint64_t*>(ptr);
*dest=value;

Godbolt.

Gcc actually does what I want - a simple one instruction to copy 64bit value. But it is useless if it is UB.

One way how I could go about fixing it is to really create std::uint64_t object at that place. But, apparently placement new does not work with volatile pointers either.

Questions

  • So, is there a better (safe) way than byte-by-byte copy?
  • I would also like to copy even larger blocks of raw bytes. Can this be done better than by individual bytes?
  • Is there any possibility to force memcpy do the right thing?
  • Do I needlessly worry about the performance and should just go with the loop?
  • Any examples(mostly C) do not use volatile at all, should I do that too? Is mmaped pointer treated differently already? How?

Thanks for any suggestions.

EDIT:

Both processes run on the same system. Also please assume the values can be copied byte-by-byte, not talking about complex virtual classes storing pointers to somewhere. All Integers and no floats would be just fine.

like image 499
Quimby Avatar asked Jan 24 '23 14:01

Quimby


1 Answers

My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.

Your understanding is wrong. Don't use volatile for controlling memory visibility - that isn't what it is for. It will either be unnecessarily expensive, or insufficiently strict, or both.

Consider, for example, the GCC documentation on volatile, which says:

Accesses to non-volatile objects are not ordered with respect to volatile accesses. You cannot use a volatile object as a memory barrier to order a sequence of writes to non-volatile memory

If you just want to avoid tearing, cacheing, and reordering - use <atomic> instead. For example, if you have an existing shared uint64_t (and it is correctly aligned), just access it via a std::atomic_ref<uint64_t>. You can use acquire, release, or CAS directly with this.

If you need normal synchronization, then your existing semaphore will be fine. As below, it already supplies whatever fences are necessary, and prevents reordering across the wait/post calls. It doesn't prevent reordering or other optimizations between them, but that's generally fine.


As for

Any examples(mostly C) do not use volatile at all, should I do that too? Is mmaped pointer treated differently already? How?

the answer is that whatever synchronization is used is required to also apply appropriate fences.

POSIX lists these functions as "synchronizing memory", which means they must both emit any required memory fences, and prevent inappropriate compiler reordering. So, for example, your implementation must avoid moving memory accesses across pthread_mutex_*lock() or sem_wait()/sem_post() calls in order to be POSIX-compliant, even where it would otherwise be legal C or C++.

When you use C++'s built-in thread or atomic support, the correct semantics are part of the language standard instead of a platform extension (but shared memory isn't).

like image 80
Useless Avatar answered Jan 30 '23 00:01

Useless