I'm a simd beginner, I've read this article about the topic (since I'm using a AVX2-compatible machine).
Now, I've read in this question to check if your pointer is aligned.
I'm testing it with this toy example main.cpp
:
#include <iostream>
#include <immintrin.h>
#define is_aligned(POINTER, BYTE_COUNT) \
(((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)
int main()
{
float a[8];
for(int i=0; i<8; i++){
a[i]=i;
}
__m256 evens = _mm256_set_ps(2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0);
std::cout<<is_aligned(a, 16)<<" "<<is_aligned(&evens, 16)<<std::endl;
std::cout<<is_aligned(a, 32)<<" "<<is_aligned(&evens, 32)<<std::endl;
}
And compile it with icpc -std=c++11 -o main main.cpp
.
The resulting printing is:
1 1
1 1
However, if I add thhese 3 lines before the 4 prints:
for(int i=0; i<8; i++)
std::cout<<a[i]<<" ";
std::cout<<std::endl;
This is the result:
0 1 2 3 4 5 6 7
1 1
0 1
In particular, I don't understand that last 0
. Why is it different from the last printing? What am I missing?
How can the technician guarantee that the memory is correctly aligned? The label on the memory module should always face the CPU. A notch in the memory module should be aligned with a notch in the memory slot. The arrows on the memory module should be aligned with the arrows on the motherboard slot.
General Byte Alignment RulesStructures between 5 and 8 bytes of data should be padded so that the total structure is 8 bytes. Structures between 9 and 16 bytes of data should be padded so that the total structure is 16 bytes. Structures greater than 16 bytes should be padded to 16 byte boundary.
What is alignment? Alignment refers to the arrangement of data in memory, and specifically deals with the issue of accessing data as proper units of information from main memory. First we must conceptualize main memory as a contiguous block of consecutive memory locations. Each location contains a fixed number of bits.
The GNU documentation states that malloc is aligned to 16 byte multiples on 64 bit systems.
Your is_aligned
(which is a macro, not a function) determines whether the object has been aligned to particular boundary. It does not determine the alignment requirement of the type of the object.
The compiler will guarantee for a float array, that it be aligned to at least the alignment requirement of a float - which is typically 4. 32 is not a factor of 4, so there is no guarantee that the array be aligned to 32 byte boundary. However, there are many memory addresses that are divisible by both 4 and 32, so it is possible that a memory address at a 4 byte boundary happens to also be at a 32 byte boundary. This is what happened in your first test, but as explained, there is no guarantee that it would happen. In your latter test you added some local variables, and the array ended up in another memory location. It so happened that the other memory location wasn't at the 32 byte boundary.
To request a stricter alignment that may be required by SIMD instructions, you can use the alignas
specifier:
alignas(32) float a[8];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With