Hint to compiler that it can use aligned memcpy

Tags:

I have a struct consisting of seven __m256 values, which is stored 32-byte aligned in memory.

typedef struct
{
        __m256 xl,xh;
        __m256 yl,yh;
        __m256 zl,zh;
        __m256i co;
} bloxset8_t;

I achieve the 32-byte alignment by using the posix_memalign() function for dynamically allocated data, or using the (aligned(32)) attribute for statically allocated data.

The alignment is fine, but when I use two pointers to such a struct, and pass them as destination and source for memcpy() then the compiler decides to use __memcpy_avx_unaligned() to copy.

How can I force clang to use the aligned avx memcpy function instead, which I assume is the faster variant?

OS: Ubuntu 16.04.3 LTS, Clang: 3.8.0-2ubuntu4.

UPDATE
The __memcpy_avx_unaligned() is invoked only when copying two or more structs. When copying just one, clang emits 14 vmovup instructions.

554

asked Nov 10 '17 22:11

Bram

1 Answers

__memcpy_avx_unaligned is just an internal glibc function name. It does not mean that there is a faster __memcpy_avx_aligned function. The name is just convey a hint to the glibc developers how this memcpy variant is implemented.

The other question is whether it would be faster for the C compiler to emit an inline expansion of memcpy, using four AVX2 load/store operations. The code for that would be larger than the memcpy call, but it might still be faster overall. It may be possible to help the compiler to do this using the __builtin_assume_aligned builtin.

164

answered Oct 26 '22 23:10

Florian Weimer

Related questions
                            
                                How to pass pointer to slice to C function in go
                            
                                Function definitions of built-in functions in C
                            
                                void* will have the same representation and memory alignment as a pointer to char
                            
                                GCC Assembly "+t"
                            
                                Simplest format to store uncompressed RGBA image data
                            
                                C Function is deprecated
                            
                                Associativity of floating-point multiplication in a special case
                            
                                Why doesn't GCC throw a warning in this example
                            
                                Unable to understand pthread_create() behaviour in the following program?
                            
                                scanf(), field width, inf and nan
                            
                                Improving the performance of Matrix Multiplication
                            
                                Atomic operations - C
                            
                                How to set --whole-archive flag in CMake so that it is used by all the dependent
                            
                                Printing difference of adjacent numbers without using array
                            
                                C Macros: How to map another macro to variadic arguments?
                            
                                Re-open file read-only in Linux (or POSIX) [duplicate]
                            
                                Avoid repetition in C error handling
                            
                                Packing and pointer aliasing, C and C++
                            
                                How much time to initialize an array to 0?
                            
                                Why does allocating a large element on the stack not fail in this specific case?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hint to compiler that it can use aligned memcpy

Tags:

c

avx

glibc

memory-alignment

memcpy

Bram

People also ask

1 Answers

Florian Weimer

Recent Activity

Donate For Us