(to provide the information you need to understand my question is a lot, however it is already compressed)
i try to implement a class template to allocate and access data cache aligned. This works very good, however trying to implement support for arrays is a problem.
Semantically the code shall provide this mapping in memory for a single element like this:
cache_aligned<element_type>* my_el =
new(cache_line_size) cache_aligned<element_type>();
| element | buffer |
the access (so far) looks like this:
*my_el; // returns cache_aligned<element_type>
**my_el; //returns element_type
*my_el->member_of_element();
HOWEVER for an array, i'd like to have this:
cache_aligned<element_type>* my_el_array =
new(cache_line_size) cache_aligned<element_type()[N];
| element 0 | buffer | element 1 | buffer | ... | element (N-1) | buffer |
So far i have the following code
template <typename T>
class cache_aligned {
private:
T instance;
public:
cache_aligned()
{}
cache_aligned(const T& other)
:instance(other.instance)
{}
static void* operator new (size_t size, uint c_line_size) {
return c_a_malloc(size, c_line_size);
}
static void* operator new[] (size_t size, uint c_line_size) {
int num_el = (size - sizeof(cache_aligned<T>*)
/ sizeof(cache_aligned<T>);
return c_a_array(sizeof(cache_aligned<T>), num_el, c_line_size);
}
static void operator delete (void* ptr) {
free_c_a(ptr);
}
T* operator-> () {
return &instance;
}
T& operator * () {
return instance;
}
};
the functions cache_aligned_malloc
void* c_a_array(uint size, ulong num_el, uint c_line_size) {
void* mem = malloc((size + c_line_size) * num_el + sizeof(void*));
void** ptr = (void**)((long)mem + sizeof(void*));
ptr[-1] = mem;
return ptr;
}
void free_c_a(void ptr) {
free(((void**)ptr)[-1]);
}
The problem is here, the access to the data should work like this:
my_el_array[i]; // returns cache_aligned<element_type>
*(my_el_array[i]); // returns element_type
my_el_array[i]->member_of_element();
My ideas to solve it, are:
(1) something similar to this, to overload sizeof operator:
static size_t operator sizeof () {
return sizeof(cache_aligned<T>) + c_line_size;
}
--> not possible since overloading sizeof operator is illegal
(2) something like this, to overload the operator [] for the pointer type:
static T& operator [] (uint index, cache_aligned<T>* ptr) {
return ptr + ((sizeof(cache_aligned<T>) + c_line_size) * index);
}
--> not possible in C++, anyway
(3) totally trivial solution
template <typename T> cache_aligned {
private:
T instance;
bool buffer[CACHE_LINE_SIZE];
// CACHE_LINE_SIZE defined as macro
public:
// trivial operators and methods ;)
};
--> i don't know whether this is reliable, actually i'm using gcc-4.5.1 in linux ...
(4) Replacing T instance; by T* instance_ptr; in the class template and using the operator [] to calculate the position of the element, like this:
| pointer-to-instance | ----> | element 0 | buffer | ... | element (N-1) | buffer |
this is not the intended semantic, since the instance of the class template becomes the bottleneck when calculating the address of the elements.
Thanks for reading! I dont' know how to shorten the problem. It would be great, if you can help! Any work around would help a lot.
I know alignment is an extension in C++0x. However, in gcc it is not available yet.
Greetz, sema
When c_line_size is compile time integral constant then of course better pad the cache_aligned with char array depending on sizeof T.
You can also check if 2 T-s fit onto one cache line and lower the alignment requirement accordingly.
Do not expect miracles from such an optimization. I think 2 times better performance for some algorithms is the ceiling that you can squeeze out from avoiding cache line splits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With