Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Class template for cache aligned memory usage in C++

(to provide the information you need to understand my question is a lot, however it is already compressed)

i try to implement a class template to allocate and access data cache aligned. This works very good, however trying to implement support for arrays is a problem.

Semantically the code shall provide this mapping in memory for a single element like this:

cache_aligned<element_type>* my_el = 
          new(cache_line_size) cache_aligned<element_type>();
| element | buffer |

the access (so far) looks like this:

*my_el; // returns cache_aligned<element_type>
**my_el; //returns element_type
*my_el->member_of_element();

HOWEVER for an array, i'd like to have this:

 cache_aligned<element_type>* my_el_array = 
         new(cache_line_size)  cache_aligned<element_type()[N];
 | element 0 | buffer | element 1 | buffer | ... | element (N-1) | buffer |

So far i have the following code

template <typename T>
class cache_aligned {
    private:
        T instance;
    public:
        cache_aligned()
        {}
        cache_aligned(const T& other)
        :instance(other.instance)
        {}
        static void* operator new (size_t size, uint c_line_size) {
             return c_a_malloc(size, c_line_size);
        }
        static void* operator new[] (size_t size, uint c_line_size) {
             int num_el = (size - sizeof(cache_aligned<T>*) 
                              / sizeof(cache_aligned<T>);
             return c_a_array(sizeof(cache_aligned<T>), num_el, c_line_size);
        }
        static void operator delete (void* ptr) {
             free_c_a(ptr);
        }
        T* operator-> () {
             return &instance;
        }
        T& operator * () {
             return instance;
        }
};

the functions cache_aligned_malloc

void* c_a_array(uint size, ulong num_el, uint c_line_size) {
    void* mem = malloc((size + c_line_size) * num_el + sizeof(void*));
    void** ptr = (void**)((long)mem + sizeof(void*));
    ptr[-1] = mem;
    return ptr;
}

void free_c_a(void ptr) {
    free(((void**)ptr)[-1]);
}

The problem is here, the access to the data should work like this:

my_el_array[i]; // returns cache_aligned<element_type>
*(my_el_array[i]); // returns element_type
my_el_array[i]->member_of_element();

My ideas to solve it, are:

(1) something similar to this, to overload sizeof operator:

static size_t operator sizeof () {
   return sizeof(cache_aligned<T>) + c_line_size;
}

--> not possible since overloading sizeof operator is illegal

(2) something like this, to overload the operator [] for the pointer type:

static T& operator [] (uint index, cache_aligned<T>* ptr) {
    return ptr + ((sizeof(cache_aligned<T>) + c_line_size) * index);
}

--> not possible in C++, anyway

(3) totally trivial solution

template <typename T> cache_aligned {
    private:
          T instance;
          bool buffer[CACHE_LINE_SIZE]; 
          // CACHE_LINE_SIZE defined as macro
    public:
          // trivial operators and methods ;)
};

--> i don't know whether this is reliable, actually i'm using gcc-4.5.1 in linux ...

(4) Replacing T instance; by T* instance_ptr; in the class template and using the operator [] to calculate the position of the element, like this:

| pointer-to-instance | ----> | element 0 | buffer | ... | element (N-1) | buffer |

this is not the intended semantic, since the instance of the class template becomes the bottleneck when calculating the address of the elements.

Thanks for reading! I dont' know how to shorten the problem. It would be great, if you can help! Any work around would help a lot.

I know alignment is an extension in C++0x. However, in gcc it is not available yet.

Greetz, sema

like image 395
sema Avatar asked Feb 16 '26 17:02

sema


1 Answers

When c_line_size is compile time integral constant then of course better pad the cache_aligned with char array depending on sizeof T.

You can also check if 2 T-s fit onto one cache line and lower the alignment requirement accordingly.

Do not expect miracles from such an optimization. I think 2 times better performance for some algorithms is the ceiling that you can squeeze out from avoiding cache line splits.

like image 116
Öö Tiib Avatar answered Feb 19 '26 07:02

Öö Tiib