C++ Cache friendly way of accessing all the members of all elements of a `vector `

Question

I'm interested in optimizing my code for multithreaded computing. In terms of the cache, pipelining, or any other aspects of memory access, how do the following compare for conserving those resources:

Case 1

struct something{
    float a;
    float b;
    int c;
    bool d;
};

vector <something> vec(n, something());

for(int q=0; q<n; q++)
    {
         vec[q].a = expression1;
         vec[q].b = expression2;
         vec[q].c = expression3;
         vec[q].d = expression4;
    }

Case 2

struct something{
    float a;
    float b;
    int c;
    bool d;
};

vector <something> vec(n, something());

for(int q=0; q<n; q++)
    vec[q].a = expression1;
for(int q=0; q<n; q++)
    vec[q].b = expression2;
for(int q=0; q<n; q++)
    vec[q].c = expression3;
for(int q=0; q<n; q++)
    vec[q].d = expression4;

Case 3

vector <float> a(n);
vector <float> b(n);
vector <int>   c(n);
vector <bool>  d(n); 

for(int q=0; q<n; q++)
    a[q] = expression1;
for(int q=0; q<n; q++)
    b[q] = expression2;
for(int q=0; q<n; q++)
    c[q] = expression3;
for(int q=0; q<n; q++)
    d[q] = expression4;

Also, are there better ways of approaching the above?

Mysticial · Accepted Answer

Case 1 is the most readable.
Case 1 and case 3 are equally cache friendly. Both make only one pass through all the data.*
Case 2 is the worst because it makes 4 passes over the data - each pass only touching one element.

If all the struct fields are different, then case 3 has a huge advantage of possibly being vectorizable while case 1 doesn't.

The reason for this is because case 3 is the struct of arrays packing that puts all the same datatypes together sequentially in memory - thereby exposing vectorization.

EDIT :

*Case 3 is potentially even more cache friendly than case 1 because it doesn't need struct-padding - so the data size is smaller.

hamstergene · Answer

In terms of cache access, Case 2 is clearly the worst: it will reload memory into cache 4 times.

Case 3 is the same as Case 1 when filling data, but may be worse for later use (assuming that a b c d are related and will likely be read together).

This one is even better than case 1:

for (vector<something>::iterator it = vec.begin(); it != vec.end(); ++it)
{
    it->a = e1;
    it->b = e2;
    it->c = e3;
    it->d = e4;
}

What will be faster depends on many things. For example, computing complex expressions in wrong order may be much worse than any cache misses. You should never make pure theoretical choices without doing real profiling.

C++ Cache friendly way of accessing all the members of all elements of a `vector <struct_type>`

Tags:

c++

optimization

memory

struct

cache-control

Matt Munson

2 Answers

Mysticial

hamstergene

Recent Activity

Donate For Us