Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is better? mask_compress + store or mask_compressstoreu

Tags:

simd

avx512

I am using sde (intel's emulator) to run avx512 code and do not have actual hardware to benchmark.

For some reason there is no information that I could find on comparative performance between compress + store and compressstore.

compress + store would store the whole register and not just the selected elements but I am fine with that. While compressstore has to mask the not selected elements.

What is better? There is no latency information on the intel's website as far as I can see.

like image 332
Denis Yaroshevskiy Avatar asked Dec 21 '25 10:12

Denis Yaroshevskiy


1 Answers

UPD: AMD - ZEN4. According to this: https://www.mersenneforum.org/showthread.php?p=614191 ZEN4 perf of compressstoreu is very poor, so if the code might be running on AMD -compressstoreu should be avoided.

I looked in a slightly wrong place: the compress instructions are only avaliable for epi32 and those have latencies:

_mm256_mask_compress_epi32 has latency 6 _mm256_mask_compressstoreu_epi32 has latency 11 and the others seem to require VBMI2, which are not available on my target.

So seems like compress + store should be better.

like image 72
Denis Yaroshevskiy Avatar answered Dec 23 '25 18:12

Denis Yaroshevskiy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!