Looking at the AVX2 intrinsics documentation there are gathered load instructions such as VPGATHERDD
:
__m128i _mm_i32gather_epi32 (int const * base, __m128i index, const int scale);
What isn't clear to me from the documentation is whether the calculated load address is an element address or a byte address, i.e. is the load address for element i
:
load_addr = base + index[i] * scale; // (1) element addressing ?
or:
load_addr = (char *)base + index[i] * scale; // (2) byte addressing ?
From the Intel docs it looks like it might be (2), but this doesn't make much sense given that the smallest element size for gathered loads is 32 bits - why would you want to load from misaligned addresses (i.e. use scale < 4) ?
Gather instructions do not have any alignment requirements. So it would be too restrictive not to allow byte addressing.
Other reason is consistency. With SIB addressing we obviously have byte address:
MOV eax, [rcx + rdx * 2]
Since VPGATHERDD
is just a vectorized variant of this MOV
instruction, we should not expect anything different with VSIB addressing:
VPGATHERDD ymm0, [rcx + ymm2 * 2], ymm3
As for real life use for byte addressing, we could have a 24-bit color image where each pixel is 3-byte aligned. We could load 8 pixels with single VPGATHERDD instruction but only if "scale" field in VSIB is "1" and VPGATHERDD
uses byte addressing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With