Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ load and store optimizations and heap objects

Tags:

c++

simd

sse

I am trying to wrap my head around memory accesses to intrinsic types, that have or haven't been loaded into registers.

Assuming some SIMD functions which accept references to float arrays. For example,

void do_something(std::array<float, 4>& arr);
void do_something_else(std::array<float, 4>& arr);

Each function first loads the data in registers, performs its operation, then stores the result back into the array. Assuming the following snippet :

std::array<float, 4> my_arr{0.f, 0.f, 0.f, 0.f};
do_something(my_arr);
do_something_else(my_arr);
do_something(my_arr);

Does the c++ compiler optimize out the unnecessary loads and stores between function calls? Does this even matter?

I've seen libraries that wrap an __m128 type in a struct, and call the load in the constructor. What happens when you store these on the heap and try to call intrinsics on them? For example,

struct vec4 {
    vec4(std::array<float, 4>&) {
        // do load
    }

    __m128 data;
};

std::vector<vec4> my_vecs;
// do SIMD work

Do you have to load/store the data every access? Or should these classes declare a private operator new, so they aren't stored on the heap?

like image 653
scx Avatar asked Mar 02 '26 11:03

scx


1 Answers

If the compiler compiles the functions separately from the calls, it cannot optimize out the stores and loads. This is definitely the case when the functions are in one .cpp file, the calls in another .cpp file, and link time optimizations are not enabled.

However, if the compiler

  1. sees the function definitions and their calls at the same time (or during link time optimization),

  2. decides to inline the function calls and

  3. decides to fuse the loops,

then it will likely remove the unnecessary stores and loads.

Note however, that none of the three points is trivial. The programmer only controls the first point, the other two are 100% at the discretion of the compiler. Consequently, you generally have to assume that such optimizations do not happen. Chances for inlining rise a bit if your functions are actually templates (which also guarantees that point 1 is satisfied), but whether the compiler actually fuses the loops is out of your control.


Regarding structs that contain SIMD types: It's perfectly legal for a SIMD type to reside on the heap. There's absolutely no difference from it being allocated on the stack.

However, you cannot just alias a std::array<float, 4> with a __m128, that would violate strict aliasing rules. Reinterpretation of std::array<float, 4> to __m128 can only happen safely with a copy (reinterpretation to char*, copy, reinterpretation to __m128), otherwise your compiler is allowed to mix up the accesses to the array and the SIMD type.

like image 90
cmaster - reinstate monica Avatar answered Mar 04 '26 01:03

cmaster - reinstate monica



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!