Storing individual doubles from a packed double vector using Intel AVX

Tags:

I'm writing code using the C intrinsics for Intel's AVX instructions. If I have a packed double vector (a __m256d), what would be the most efficient way (i.e. the least number of operations) to store each of them to a different place in memory (i.e. I need to fan them out to different locations such that they are no longer packed)? Pseudocode:

__m256d *src;
double *dst;
int dst_dist;
dst[0] = src[0];
dst[dst_dist] = src[1];
dst[2 * dst_dist] = src[2];
dst[3 * dst_dist] = src[3];

Using SSE, I could do this with __m128 types using the _mm_storel_pi and _mm_storeh_pi intrinsics. I've not been able to find anything similar for AVX that allows me to store the individual 64-bit pieces to memory. Does one exist?

423

asked Dec 09 '11 03:12

Jason R

1 Answers

You can do it with a couple of extract instrinsics: (warning: untested)

 __m256d src = ...  //  data

__m128d a = _mm256_extractf128_pd(src, 0);
__m128d b = _mm256_extractf128_pd(src, 1);

_mm_storel_pd(dst + 0*dst_dist, a);
_mm_storeh_pd(dst + 1*dst_dist, a);
_mm_storel_pd(dst + 2*dst_dist, b);
_mm_storeh_pd(dst + 3*dst_dist, b);

What you want is the gather/scatter instructions in AVX2... But that's still a few years down the road.

110

answered Jan 01 '23 12:01

Mysticial

Related questions
                            
                                Loading an entire cache line at once to avoid contention for multiple elements of it
                            
                                How to use pop and ret in MASM
                            
                                Disambiguate labels from register names in the Intel syntax
                            
                                What is the benefit of calling ioread functions when using memory mapped IO
                            
                                Playing sound with the PC Speaker in x86 Assembly
                            
                                UEFI boot services CreateEvent() returning status EFI_INVALID_PARAMETER
                            
                                Intel AVX-512: how to set the EVEX.z bit
                            
                                Do store instructions block subsequent instructions on a cache miss?
                            
                                What do the constraints "Rah" and "Ral" mean in extended inline assembly?
                            
                                Can two fuseable pairs be decoded in the same clock cycle?
                            
                                How can I tell whether I am on x64 or x86 using .NET?
                            
                                Assembly instructions to find how many threads are enabled in a multi-core system
                            
                                Ada and assembly
                            
                                help understanding differences between #define, const and enum in C and C++ on assembly level
                            
                                MOV BX,[SI] - ASM question
                            
                                Tail recursion in assembly
                            
                                Segfault with x86 assembly on mov 0, %eax
                            
                                How to disable floating point unit (FPU)?
                            
                                How does x86 eflags bit 18 (alignment check) work? (Related to check for 386 vs. 486 and later.)
                            
                                Calling convention for dynamically created function in (Visual) C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Storing individual doubles from a packed double vector using Intel AVX

Tags:

x86

x86-64

avx

sse

Jason R

People also ask

1 Answers

Mysticial

Recent Activity

Donate For Us