I have a 3D data, that are stored in 1D array. I compute the 1D indexes like this:
index = i + j * WIDTH + k * WIDTH * HEIGHT
Than I need to get original i,j,k
indexes back from index
. The obvious way to do this is something like this:
k = index / (WIDTH * HEIGHT)
j = (index % (WIDTH * HEIGHT)) / WIDTH
i = index - j * WIDTH - k * WIDTH * HEIGHT
But I wonder, is there some more efficient way to do this? At least without the modulo...
Context of this question - I have a kernel in CUDA where I access the data and compute i, j, k
indexes (the index
corresponds to unique thread ID). So maybe there is some CUDA-specific way to do this? I guess this is quite common problem, but I couldn't find a better way to do this...
Thanks for your ideas!
To create a three-dimensional array, we pass the object representing x by y by z in python, where x is the nested lists in the object, y is the nested lists inside the x nested lists, and z is the values inside each y nested list. The newly created three-dimensional array is stored in the variable called threedimarray.
A 3D array is a multi-dimensional array(array of arrays). A 3D array is a collection of 2D arrays . It is specified by using three subscripts:Block size, row size and column size. More dimensions in an array means more data can be stored in that array.
Therefore, going by this definition, i will be zero for the starting element of the array because the starting element is at 0 distance away from the starting element of the array. To fit this definition of arr[i], indexing of array starts from 0.
What you've got is fine; if you want to avoid the modulo (since that's very expensive on gpus) you can just do with j
what you've done with i
:
j = (index - (k*WIDTH*HEIGHT))/WIDTH
If you want the logic to be a little clearer, and don't need the original index
, you can do
k = index/(WIDTH*HEIGHT);
index -= k*WIDTH*HEIGHT;
j = index/WIDTH;
index -= j*WIDTH;
i = index/1;
which is then pretty straightforwardly extended to arbitrary dimensions. You can try tweaking the above by doing things like precomputing WIDTH*HEIGHT
, say, but I'd just turn up optimization and trust the compiler to do that for you.
The suggestions about rounding up to a power of 2 are correct in the sense that it would speed up the index calculation, but at quite some cost. In this (not too bad) case, WIDTH=HEIGHT=100
, it would increase memory requirements of your 3d array by 60% (WIDTH=HEIGHT=128
) and memory on GPU is generally already tight; and making your arrays powers-of-two size might well introduce problems with bank conflicts, depending on your access patterns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With