Suppose I have an array:
uint8_t arr[256];
and an element
__m128i x
containing 16 bytes,
x_1, x_2, ... x_16
I would like to efficiently fill a new __m128i
element
__m128i y
with values from arr
depending on the values in x
, such that:
y_1 = arr[x_1]
y_2 = arr[x_2]
.
.
.
y_16 = arr[x_16]
A command to achieve this would essentially be loading a register from a non-contiguous set of memory locations. I have a painfully vague memory of having seen documentation of such a command, but can't find it now. Does it exist? Thanks in advance for your help.
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
Definition. An array is an indexed collection of data elements of the same type. 1) Indexed means that the array elements are numbered (starting at 0). 2) The restriction of the same type is an important one, because arrays are stored in consecutive memory cells.
Indexing is an operation that pulls out a select set of values from an array. The index of a value in an array is that value's location within the array. There is a difference between the value and where the value is stored in an array.
An array is an ordered list of values that you refer to with a name and an index. For example, consider an array called emp , which contains employees' names indexed by their numerical employee number. So emp[0] would be employee number zero, emp[1] employee number one, and so on.
This kind of capability in SIMD architectures is known as load/store scatter/gather. Unfortunately SSE does not have it. Future SIMD architectures from Intel may have this - the ill-fated Larrabee processor was one case in point. For now though you will just need to design your data structures in such a way that this kind of functionality is not needed.
Note that you can achieve the equivalent effect by using e.g. _mm_set_epi8:
y = _mm_set_epi8(arr[x_16], arr[x_15], arr[x_14], ..., arr[x_1]);
although of course this will just generate a bunch of scalar code to load your y vector. This is fine if you are doing this kind of operation outside any performance-critical loops, e.g. as part of initialisation prior to looping, but inside a loop it is likely to be a performance-killer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With