Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Cache friendly way of accessing all the members of all elements of a `vector <struct_type>`

I'm interested in optimizing my code for multithreaded computing. In terms of the cache, pipelining, or any other aspects of memory access, how do the following compare for conserving those resources:

Case 1

struct something{
    float a;
    float b;
    int c;
    bool d;
};

vector <something> vec(n, something());

for(int q=0; q<n; q++)
    {
         vec[q].a = expression1;
         vec[q].b = expression2;
         vec[q].c = expression3;
         vec[q].d = expression4;
    } 

Case 2

struct something{
    float a;
    float b;
    int c;
    bool d;
};

vector <something> vec(n, something());

for(int q=0; q<n; q++)
    vec[q].a = expression1;
for(int q=0; q<n; q++)
    vec[q].b = expression2;
for(int q=0; q<n; q++)
    vec[q].c = expression3;
for(int q=0; q<n; q++)
    vec[q].d = expression4;

Case 3

vector <float> a(n);
vector <float> b(n);
vector <int>   c(n);
vector <bool>  d(n); 

for(int q=0; q<n; q++)
    a[q] = expression1;
for(int q=0; q<n; q++)
    b[q] = expression2;
for(int q=0; q<n; q++)
    c[q] = expression3;
for(int q=0; q<n; q++)
    d[q] = expression4;

Also, are there better ways of approaching the above?

like image 787
Matt Munson Avatar asked Oct 09 '22 09:10

Matt Munson


2 Answers

  • Case 1 is the most readable.
  • Case 1 and case 3 are equally cache friendly. Both make only one pass through all the data.*
  • Case 2 is the worst because it makes 4 passes over the data - each pass only touching one element.

If all the struct fields are different, then case 3 has a huge advantage of possibly being vectorizable while case 1 doesn't.

The reason for this is because case 3 is the struct of arrays packing that puts all the same datatypes together sequentially in memory - thereby exposing vectorization.

EDIT :

*Case 3 is potentially even more cache friendly than case 1 because it doesn't need struct-padding - so the data size is smaller.

like image 80
Mysticial Avatar answered Oct 11 '22 22:10

Mysticial


In terms of cache access, Case 2 is clearly the worst: it will reload memory into cache 4 times.

Case 3 is the same as Case 1 when filling data, but may be worse for later use (assuming that a b c d are related and will likely be read together).

This one is even better than case 1:

for (vector<something>::iterator it = vec.begin(); it != vec.end(); ++it)
{
    it->a = e1;
    it->b = e2;
    it->c = e3;
    it->d = e4;
}

What will be faster depends on many things. For example, computing complex expressions in wrong order may be much worse than any cache misses. You should never make pure theoretical choices without doing real profiling.

like image 39
hamstergene Avatar answered Oct 11 '22 21:10

hamstergene