Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store lower or higher values from AVX/AVX2(YMM) register to memory like the SSE movlps/movhps does?

Tags:

x86

avx

simd

sse

avx2

Is there any existing instructions which could store lower or higher values from a 256 bit AVX/AVX2(YMM) register to memory address, just like the SSE instruction movlps/movhps does?

Or is there any other way to implement this?

Any help would be appreciated, thanks!

like image 384
Sean Yang Avatar asked Jan 30 '13 08:01

Sean Yang


1 Answers

Store the low128 with vmovdqu [rdi], xmm0.

Store the high128 with VEXTRACTI128 xmm1/m128, ymm2, 1. Probably you can get a compiler to generate a store to memory by assigning the result of an extract intrinsic to a memory reference.

vextracti128 / f128 takes 2 uops, even in the fused domain (Haswell), so IDK what the point of having it encodable with an immediate operand of 0 is. (until AVX512, when an immediate index instead of a movh becomes relevant, since they didn't know they were going to replace VEX with EVEX for AVX512). There's no penalty for mixing AVX2 with xmm regs and AVX2 with ymm regs, so you can just use a 128b store of the xmm version to get the low 128, just like you can get the low32 of a 64b GP reg by referencing eax instead of rax.

It's probably annoying to cast stuff when using intrinsics, so with luck a compiler will compile _mm256_extracti128_si256 (vec, 0) to a vmovdqu of the corresponding xmm reg. But if your compiler doesn't, your code will be faster if you get it to generate vmovdqu. (movdqu is as fast as vmovdqa if the address is aligned, just like non-mov AVX memory access.)

like image 187
Peter Cordes Avatar answered Oct 20 '22 09:10

Peter Cordes