Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SIMD vector memory load in LLVM

What is the "correct" (i.e., portable) way in LLVM to load data from memory into a SIMD vector?

Looking at the typical IR generated by LLVM's auto-vectorizer for an x86 target, it seems like the pattern is:

  • bitcast a pointer to the scalar type (e.g., double *) to the corresponding vector type (e.g., <4 x double>*),
  • load from the converted pointer while taking into account alignment considerations (i.e., don't use the natural alignment of the vector type, but the alignment of the corresponding scalar type).

In the case of AVX, this pattern maps nicely to SIMD intrinsics such as _mm256_loadu_pd() and friends. However, I have no idea if this strategy would also be correct for other ISAs (e.g., Neon, AltiVec).

I haven't been able to find info on the topic in the LLVM docs. Am I missing something obvious?

like image 639
bluescarni Avatar asked Jul 25 '20 15:07

bluescarni


1 Answers

Having spent some more time thinking about this, I believe that a portable solution may be the following:

  • load the scalar values one by one from memory in the usual (non-SIMD) way,
  • immediately build a vector with repeated insertelement instructions.

Similarly, in order to store the values in a SIMD vector to a memory location, extract the vector elements as scalars via the extractelement instruction and store them one by one.

In my experiments, the LLVM optimizer was always successful in recognising these patterns and fusing them into direct SIMD load/store instructions.

However, this strategy also results in a noticeable bloat in the size of the generated IR and subsequent degradation in compilation times. Hence, for the time being I'll stick to the direct bitcasting approach and perhaps implement this other approach as a fallback if the bitcasting method fails on specific setups.

like image 191
bluescarni Avatar answered Sep 30 '22 07:09

bluescarni