What is the "correct" (i.e., portable) way in LLVM to load data from memory into a SIMD vector?
Looking at the typical IR generated by LLVM's auto-vectorizer for an x86 target, it seems like the pattern is:
double *
) to the corresponding vector type (e.g., <4 x double>*
),In the case of AVX, this pattern maps nicely to SIMD intrinsics such as _mm256_loadu_pd()
and friends. However, I have no idea if this strategy would also be correct for other ISAs (e.g., Neon, AltiVec).
I haven't been able to find info on the topic in the LLVM docs. Am I missing something obvious?
Having spent some more time thinking about this, I believe that a portable solution may be the following:
insertelement
instructions.Similarly, in order to store the values in a SIMD vector to a memory location, extract the vector elements as scalars via the extractelement
instruction and store them one by one.
In my experiments, the LLVM optimizer was always successful in recognising these patterns and fusing them into direct SIMD load/store instructions.
However, this strategy also results in a noticeable bloat in the size of the generated IR and subsequent degradation in compilation times. Hence, for the time being I'll stick to the direct bitcasting approach and perhaps implement this other approach as a fallback if the bitcasting method fails on specific setups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With