I am working on a single producer single consumer ring buffer implementation.I have two requirements:
My class looks something like:
#define CACHE_LINE_SIZE 64 // To be used later. template<typename T, uint64_t num_events> class RingBuffer { // This needs to be aligned to a cache line. public: .... private: std::atomic<int64_t> publisher_sequence_ ; int64_t cached_consumer_sequence_; T* events_; std::atomic<int64_t> consumer_sequence_; // This needs to be aligned to a cache line. };
Let me first tackle point 1 i.e. aligning a single heap allocated instance of the class. There are a few ways:
Use the c++ 11 alignas(..)
specifier:
template<typename T, uint64_t num_events> class alignas(CACHE_LINE_SIZE) RingBuffer { public: .... private: // All the private fields. };
Use posix_memalign(..)
+ placement new(..)
without altering the class definition. This suffers from not being platform independent:
void* buffer; if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) { perror("posix_memalign did not work!"); abort(); } // Use placement new on a cache aligned buffer. auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();
Use the GCC/Clang extension __attribute__ ((aligned(#)))
template<typename T, uint64_t num_events> class RingBuffer { public: .... private: // All the private fields. } __attribute__ ((aligned(CACHE_LINE_SIZE)));
I tried to use the C++ 11 standardized aligned_alloc(..)
function instead of posix_memalign(..)
but GCC 4.8.1 on Ubuntu 12.04 could not find the definition in stdlib.h
Are all of these guaranteed to do the same thing? My goal is cache-line alignment so any method that has some limits on alignment (say double word) will not do. Platform independence which would point to using the standardized alignas(..)
is a secondary goal.
I am not clear on whether alignas(..)
and __attribute__((aligned(#)))
have some limit which could be below the cache line on the machine. I can't reproduce this any more but while printing addresses I think I did not always get 64 byte aligned addresses with alignas(..)
. On the contrary posix_memalign(..)
seemed to always work. Again I cannot reproduce this any more so maybe I was making a mistake.
The second aim is to align a field within a class/struct to a cache line. I am doing this to prevent false sharing. I have tried the following ways:
Use the C++ 11 alignas(..)
specifier:
template<typename T, uint64_t num_events> class RingBuffer { // This needs to be aligned to a cache line. public: ... private: std::atomic<int64_t> publisher_sequence_ ; int64_t cached_consumer_sequence_; T* events_; std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE); };
Use the GCC/Clang extension __attribute__ ((aligned(#)))
template<typename T, uint64_t num_events> class RingBuffer { // This needs to be aligned to a cache line. public: ... private: std::atomic<int64_t> publisher_sequence_ ; int64_t cached_consumer_sequence_; T* events_; std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE))); };
Both these methods seem to align consumer_sequence
to an address 64 bytes after the beginning of the object so whether consumer_sequence
is cache aligned depends on whether the object itself is cache aligned. Here my question is - are there any better ways to do the same?
EDIT:
The reason aligned_alloc
did not work on my machine was that I was on eglibc 2.15 (Ubuntu 12.04). It worked on a later version of eglibc.
From the man page: The function aligned_alloc()
was added to glibc in version 2.16.
This makes it pretty useless for me since I cannot require such a recent version of eglibc/glibc.
Alignment refers to the arrangement of data in memory, and specifically deals with the issue of accessing data as proper units of information from main memory. First we must conceptualize main memory as a contiguous block of consecutive memory locations. Each location contains a fixed number of bits.
An aligned 32-bit read will require information stored in the same address in all four memory systems, so all systems can supply data simultaneously. An unaligned 32-bit read would require some memory systems to return data from one address, and some to return data from the next higher address.
An aligned memory access means that the pointer (as an integer) is a multiple of a type-specific value called the alignment. The alignment is the natural address multiple where the type must be, or should be stored (e.g. for performance reasons) on a CPU.
Unfortunately the best I have found is allocating extra space and then using the "aligned" part. So the RingBuffer new
can request an extra 64 bytes and then return the first 64 byte aligned part of that. It wastes space but will give the alignment you need. You will likely need to set the memory before what is returned to the actual alloc address to unallocate it.
[Memory returned][ptr to start of memory][aligned memory][extra memory]
(assuming no inheritence from RingBuffer) something like:
void * RingBuffer::operator new(size_t request) { static const size_t ptr_alloc = sizeof(void *); static const size_t align_size = 64; static const size_t request_size = sizeof(RingBuffer)+align_size; static const size_t needed = ptr_alloc+request_size; void * alloc = ::operator new(needed); void *ptr = std::align(align_size, sizeof(RingBuffer), alloc+ptr_alloc, request_size); ((void **)ptr)[-1] = alloc; // save for delete calls to use return ptr; } void RingBuffer::operator delete(void * ptr) { if (ptr) // 0 is valid, but a noop, so prevent passing negative memory { void * alloc = ((void **)ptr)[-1]; ::operator delete (alloc); } }
For the second requirement of having a data member of RingBuffer
also 64 byte aligned, for that if you know that the start of this
is aligned, you can pad to force the alignment for data members.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With