I am trying to wrap my head around memory accesses to intrinsic types, that have or haven't been loaded into registers.
Assuming some SIMD functions which accept references to float arrays. For example,
void do_something(std::array<float, 4>& arr);
void do_something_else(std::array<float, 4>& arr);
Each function first loads the data in registers, performs its operation, then stores the result back into the array. Assuming the following snippet :
std::array<float, 4> my_arr{0.f, 0.f, 0.f, 0.f};
do_something(my_arr);
do_something_else(my_arr);
do_something(my_arr);
Does the c++ compiler optimize out the unnecessary loads and stores between function calls? Does this even matter?
I've seen libraries that wrap an __m128 type in a struct, and call the load in the constructor. What happens when you store these on the heap and try to call intrinsics on them? For example,
struct vec4 {
vec4(std::array<float, 4>&) {
// do load
}
__m128 data;
};
std::vector<vec4> my_vecs;
// do SIMD work
Do you have to load/store the data every access? Or should these classes declare a private operator new, so they aren't stored on the heap?
If the compiler compiles the functions separately from the calls, it cannot optimize out the stores and loads. This is definitely the case when the functions are in one .cpp file, the calls in another .cpp file, and link time optimizations are not enabled.
However, if the compiler
sees the function definitions and their calls at the same time (or during link time optimization),
decides to inline the function calls and
decides to fuse the loops,
then it will likely remove the unnecessary stores and loads.
Note however, that none of the three points is trivial. The programmer only controls the first point, the other two are 100% at the discretion of the compiler. Consequently, you generally have to assume that such optimizations do not happen. Chances for inlining rise a bit if your functions are actually templates (which also guarantees that point 1 is satisfied), but whether the compiler actually fuses the loops is out of your control.
Regarding structs that contain SIMD types: It's perfectly legal for a SIMD type to reside on the heap. There's absolutely no difference from it being allocated on the stack.
However, you cannot just alias a std::array<float, 4> with a __m128, that would violate strict aliasing rules. Reinterpretation of std::array<float, 4> to __m128 can only happen safely with a copy (reinterpretation to char*, copy, reinterpretation to __m128), otherwise your compiler is allowed to mix up the accesses to the array and the SIMD type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With