I am creating a very fast multi-threaded discrete event simulation framework. The core of the framework uses atomics and lockless programming techniques to achieve very fast execution across many threads. This requires me to align some variables to cache lines and pad the remaining cache line space so that I don't have cache line contention. Here is how I do it:
// compute cache line padding size
constexpr u64 CLPAD(u64 _objSize) {
return ((_objSize / CACHELINE_SIZE) * CACHELINE_SIZE) +
(((_objSize % CACHELINE_SIZE) > 0) * CACHELINE_SIZE) -
_objSize;
}
alignas(CACHELINE_SIZE) MyObject myObj;
char padding[CLPAD(sizeof(myObj))];
This works great for me, but I stumbled upon an issue today when I was using this methodology for a new object type. The CLPAD() function returns the amount of chars needed to pad the input type up to the next cache line. However, if I put in a type that is exactly sized a multiple of number of cache lines, the CLPAD returns 0. If you attempt to create a zero sized array, you get this warning/error:
ISO C++ forbids zero-size array 'padding'
I know I could modify CLPAD() to return CACHELINE_SIZE in this case, but then I'm burning a cache line worth of space for no reason.
How can I make the declaration of 'padding' disappear if CLPAD returns 0?
The common way to solve that problem is cache padding: padding some meaningless variables between variables. That would force one variable to occupy a core's cache line alone, so when other cores update other variables would not make that core reload the variable from memory.
The chunks of memory handled by the cache are called cache lines. The size of these chunks is called the cache line size. Common cache line sizes are 32, 64 and 128 bytes. A cache can only hold a limited number of lines, determined by the cache size.
Each cache line/slot matches a memory block. That means each cache line contains 16 bytes. If the cache is 64Kbytes then 64Kbytes/16 = 4096 cache lines.
The cache line is generally fixed in size, typically ranging from 16 to 256 bytes. The effectiveness of the line size depends on the application, and cache circuits may be configurable to a different line size by the system designer. There are also numerous algorithms for dynamically adjusting line size in real time.
Taking a page from std::aligned_storage<>
, I've come up with the following:
template<class T, bool = false>
struct padded
{
using type = struct
{
alignas(CACHELINE_SIZE)T myObj;
char padding[CLPAD(sizeof(T))];
};
};
template<class T>
struct padded<T, true>
{
using type = struct
{
alignas(CACHELINE_SIZE)T myObj;
};
};
template<class T>
using padded_t = typename padded<T, (sizeof(T) % CACHELINE_SIZE == 0)>::type;
Usage:
struct alignas(32) my_type_1 { char c[32]; }; // char c[32] to silence MSVC warning
struct my_type_2 { char c[CACHELINE_SIZE * 2]; }; // ditto
int main()
{
padded_t<my_type_1> pt0;
padded_t<my_type_2> pt1;
sizeof(pt0); // 128
alignof(pt0); // 128
sizeof(pt1); // 256
alignof(pt1); // 128
}
You can provide a function to access myObj
however you wish.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With