Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hint to compiler that it can use aligned memcpy

I have a struct consisting of seven __m256 values, which is stored 32-byte aligned in memory.

typedef struct
{
        __m256 xl,xh;
        __m256 yl,yh;
        __m256 zl,zh;
        __m256i co;
} bloxset8_t;

I achieve the 32-byte alignment by using the posix_memalign() function for dynamically allocated data, or using the (aligned(32)) attribute for statically allocated data.

The alignment is fine, but when I use two pointers to such a struct, and pass them as destination and source for memcpy() then the compiler decides to use __memcpy_avx_unaligned() to copy.

How can I force clang to use the aligned avx memcpy function instead, which I assume is the faster variant?

OS: Ubuntu 16.04.3 LTS, Clang: 3.8.0-2ubuntu4.

UPDATE
The __memcpy_avx_unaligned() is invoked only when copying two or more structs. When copying just one, clang emits 14 vmovup instructions.

like image 554
Bram Avatar asked Nov 10 '17 22:11

Bram


People also ask

Does memcpy require alignment?

As a summary, if functions like memset() and memcpy() are used to access Device memory, then the pointers must be to an aligned address.

What is memory alignment in assembly?

Alignment refers to the arrangement of data in memory, and specifically deals with the issue of accessing data as proper units of information from main memory. First we must conceptualize main memory as a contiguous block of consecutive memory locations. Each location contains a fixed number of bits.


1 Answers

__memcpy_avx_unaligned is just an internal glibc function name. It does not mean that there is a faster __memcpy_avx_aligned function. The name is just convey a hint to the glibc developers how this memcpy variant is implemented.

The other question is whether it would be faster for the C compiler to emit an inline expansion of memcpy, using four AVX2 load/store operations. The code for that would be larger than the memcpy call, but it might still be faster overall. It may be possible to help the compiler to do this using the __builtin_assume_aligned builtin.

like image 164
Florian Weimer Avatar answered Oct 26 '22 23:10

Florian Weimer