Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coalesced global memory writes using hash

Tags:

c++

c

cuda

gpgpu

My question concerns the coalesced global writes to a dynamically changing set of elements of an array in CUDA. Consider the following kernel:

__global__ void
kernel (int n, int *odata, int *idata, int *hash)
{
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n)
    odata[hash[i]] = idata[i];
}

Here the first n elements of the array hash contain the indices of odata to be updated from the first n elements of idata. Obviously this leads to a terrible, terrible lack of coalescence. In the case of my code, the hash at one kernel invocation is completely unrelated to the hash at another (and other kernels update the data in other ways), so simply reordering the data to optimize this particular kenrel isn't an option.

Is there some feature in CUDA which would allow me to improve the performance of this situation? I hear a lot of talk about texture memory, but I've not been able to translate what I've read into a solution for this problem.

like image 258
coastal Avatar asked Jan 18 '26 08:01

coastal


1 Answers

Texturing is a read-only mechanism, so it cannot directly improve the performance of scattered writes to GMEM. If you were "hashing" like this instead:

odata[i] = idata[hash[i]]; 

(perhaps your algorithm can be transformed?)

Then there might be some benefit to considering a Texture mechanism. (Your example appears to be 1D in nature).

You might also make sure that your shared memory/L1 split is optimized towards cache. This won't help much with scattered writes though.

like image 100
Robert Crovella Avatar answered Jan 21 '26 00:01

Robert Crovella