Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load address calculation when using AVX2 gather instructions

Tags:

x86

simd

sse

avx2

Looking at the AVX2 intrinsics documentation there are gathered load instructions such as VPGATHERDD:

__m128i _mm_i32gather_epi32 (int const * base, __m128i index, const int scale);

What isn't clear to me from the documentation is whether the calculated load address is an element address or a byte address, i.e. is the load address for element i:

load_addr = base + index[i] * scale;               // (1) element addressing ?

or:

load_addr = (char *)base + index[i] * scale;       // (2) byte addressing ?

From the Intel docs it looks like it might be (2), but this doesn't make much sense given that the smallest element size for gathered loads is 32 bits - why would you want to load from misaligned addresses (i.e. use scale < 4) ?

like image 622
Paul R Avatar asked Apr 24 '13 13:04

Paul R


1 Answers

Gather instructions do not have any alignment requirements. So it would be too restrictive not to allow byte addressing.

Other reason is consistency. With SIB addressing we obviously have byte address:

MOV eax, [rcx + rdx * 2]

Since VPGATHERDD is just a vectorized variant of this MOV instruction, we should not expect anything different with VSIB addressing:

VPGATHERDD ymm0, [rcx + ymm2 * 2], ymm3

As for real life use for byte addressing, we could have a 24-bit color image where each pixel is 3-byte aligned. We could load 8 pixels with single VPGATHERDD instruction but only if "scale" field in VSIB is "1" and VPGATHERDD uses byte addressing.

like image 110
Evgeny Kluev Avatar answered Sep 28 '22 09:09

Evgeny Kluev