I have a 3D data, that are stored in 1D array. I compute the 1D indexes like this: <pre class="prettyprint"><code>index = i + j * WIDTH + k * WIDTH * HEIGHT </code></pre> Than I need to get original <code>i,j,k</code> indexes back from <code>index</code>. The obvious way to do this is something like this: <pre class="prettyprint"><code>k = index / (WIDTH * HEIGHT) j = (index % (WIDTH * HEIGHT)) / WIDTH i = index - j * WIDTH - k * WIDTH * HEIGHT </code></pre> But I wonder, is there some more efficient way to do this? At least without the modulo... Context of this question - I have a kernel in CUDA where I access the data and compute <code>i, j, k</code> indexes (the <code>index</code> corresponds to unique thread ID). So maybe there is some CUDA-specific way to do this? I guess this is quite common problem, but I couldn't find a better way to do this... Thanks for your ideas!

What you've got is fine; if you want to avoid the modulo (since that's very expensive on gpus) you can just do with <code>j</code> what you've done with <code>i</code>: <pre class="prettyprint"><code>j = (index - (k*WIDTH*HEIGHT))/WIDTH </code></pre> If you want the logic to be a little clearer, and don't need the original <code>index</code>, you can do <pre class="prettyprint"><code>k = index/(WIDTH*HEIGHT); index -= k*WIDTH*HEIGHT; j = index/WIDTH; index -= j*WIDTH; i = index/1; </code></pre> which is then pretty straightforwardly extended to arbitrary dimensions. You can try tweaking the above by doing things like precomputing <code>WIDTH*HEIGHT</code>, say, but I'd just turn up optimization and trust the compiler to do that for you. The suggestions about rounding up to a power of 2 are correct in the sense that it would speed up the index calculation, but at quite some cost. In this (not too bad) case, <code>WIDTH=HEIGHT=100</code>, it would increase memory requirements of your 3d array by 60% (<code>WIDTH=HEIGHT=128</code>) and memory on GPU is generally already tight; and making your arrays powers-of-two size might well introduce problems with bank conflicts, depending on your access patterns.

Efficient way to compute 3D indexes from 1D array representation

Tags:

c++

math

cuda

I have a 3D data, that are stored in 1D array. I compute the 1D indexes like this:

index = i + j * WIDTH + k * WIDTH * HEIGHT

Than I need to get original i,j,k indexes back from index. The obvious way to do this is something like this:

k = index / (WIDTH * HEIGHT) 
j = (index % (WIDTH * HEIGHT)) / WIDTH
i = index - j * WIDTH - k * WIDTH * HEIGHT

But I wonder, is there some more efficient way to do this? At least without the modulo...

Context of this question - I have a kernel in CUDA where I access the data and compute i, j, k indexes (the index corresponds to unique thread ID). So maybe there is some CUDA-specific way to do this? I guess this is quite common problem, but I couldn't find a better way to do this...

Thanks for your ideas!

705

asked Dec 15 '12 16:12

Jaa-c

1 Answers

What you've got is fine; if you want to avoid the modulo (since that's very expensive on gpus) you can just do with j what you've done with i:

j = (index - (k*WIDTH*HEIGHT))/WIDTH

If you want the logic to be a little clearer, and don't need the original index, you can do

k = index/(WIDTH*HEIGHT); 
index -= k*WIDTH*HEIGHT; 

j = index/WIDTH; 
index -= j*WIDTH; 

i = index/1;

which is then pretty straightforwardly extended to arbitrary dimensions. You can try tweaking the above by doing things like precomputing WIDTH*HEIGHT, say, but I'd just turn up optimization and trust the compiler to do that for you.

The suggestions about rounding up to a power of 2 are correct in the sense that it would speed up the index calculation, but at quite some cost. In this (not too bad) case, WIDTH=HEIGHT=100, it would increase memory requirements of your 3d array by 60% (WIDTH=HEIGHT=128) and memory on GPU is generally already tight; and making your arrays powers-of-two size might well introduce problems with bank conflicts, depending on your access patterns.

170

answered Nov 07 '22 09:11

Jonathan Dursi

Related questions
                            
                                Qt No such slot for the QProcess::finished() signal
                            
                                Runtime value to type mapping
                            
                                Why can't member variables be shared?
                            
                                Why use int as an argument for post-increment operator overload?
                            
                                Use of goto for cleanly exiting a loop
                            
                                Why compiler allows narrowing conversions
                            
                                How to get local IP address of a computer using QT [duplicate]
                            
                                Copying the content of a character array to a QString in Qt
                            
                                Passing Stack to Function
                            
                                Compute the sum of part of the vector using std:: accumulate
                            
                                Initialize array of char in initialization list of constructor in C++
                            
                                Why this code has a runtime error using map with strings (C++)?
                            
                                C++ placement new
                            
                                How to find the first smaller element than an integer X in a vector ? (c++)
                            
                                modify a static member variable in C++
                            
                                OpenCV line detection in general
                            
                                Redundant static data
                            
                                Maximum amount of data that can be sent using MPI::Send
                            
                                What would be the most efficient way to find a[i] = i in a sorted array?
                            
                                Move constructor and initialization list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With