I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile:
"not vectorized: relevant stmt not supported: D.14140_82 = D.14143_77 * D.14141_81"
Here is my loop:
void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){
for(int i=0; i<SIZE*4; i+=1){
out[i] = data1[i]*data2[i];
}
}
And the options used at compile:
-march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2
I am using arm-linux-gnueabi (v4.6 ) compiler.
It is important to note that the problem only appears with float32 vectors. If I switch in int32, then the vectorization is done. Maybe the vectorization for float32 is not yet available…
Does anyone has an idea ? Do I forget something in the cmd line or in my implementation ?
Thanks in advance for your help.
Guix
From GCC's ARM options page
-mfpu=name
...
If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
If you specify -funsafe-math-optimizations
it should work, but reread the note above if you are going to use this with high precision.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With