Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write cache friendly polymorphic code in C++?

I am writing a piece of code with high demands on performance where I need to handle a large number of objects in a polymorphic way. Let's say I have a class A and a class B which is derived from A. I could now create a vector of B:s like this

vector<A*> a(n);
for(int i = 0; i < n; i++)
  a[i] = new B();

but if n is large (in the order 10^6 or more in my case) this would require very many calls to new and moreover the n objects could potentially be spread out all over my main memory resulting in very poor cache performance. What would be the right way to deal with this situation? I am thinking of doing something like the following to have all the objects in a contiguous memory region.

B* b = new B[n];
vector<A*> a(n);
for(int i = 0; i < n; i++)
  a[i] = b + i;

but one problem is how to free up the memory allocated by new B[n] if b is not available anymore (but we still have a). I have just learnt that trying

delete[] a[0];

is not a good idea...

like image 516
Martin Avatar asked Mar 10 '11 14:03

Martin


People also ask

What is a cache friendly code?

Programs with good locality generally run faster as they have lower cache miss rate in comparison with the ones with bad locality. In a good programming practice, cache performance is always counted as one of the important factor when it comes to the analysis of the performance of a program.

Which algorithm is cache friendly?

Some more general algorithms, such as Cooley–Tukey FFT, are optimally cache-oblivious under certain choices of parameters. As these algorithms are only optimal in an asymptotic sense (ignoring constant factors), further machine-specific tuning may be required to obtain nearly optimal performance in an absolute sense.

How is array cache friendly?

Cache-friendly data structures fit within a cache line and are aligned to memory such that they make optimal use of cache lines. A common example of a cache-friendly data structure is a two-dimensional matrix. We can set its row dimension to fit in a cache size block for optimal performance.


2 Answers

If you know for sure that those will only be objects of type B why not use a parallel vector:

vector<B> storage(n);
vector<A*> pointers(n);
for(int i = 0; i < n; i++)
   pointers[i] = &storage[i];
like image 94
sharptooth Avatar answered Oct 17 '22 07:10

sharptooth


You can use placement new to construct an object at a particular memory location:

vector<A*> a(n);
for(int i = 0; i < n; i++)
  a[i] = new(storage + i*object_size) B();
  // and invoke the destructor manually to release the object (assuming A has a virtual destructor!)
  a[i]->~A(); 

But you cannot solve the 'real' problem without giving up the continuous storage: if one object is freed, it will cause a hole in the heap, thus causing high fragmentation over time. You could only keep track of the freed objects and re-use the storage.

like image 27
Alexander Gessler Avatar answered Oct 17 '22 08:10

Alexander Gessler