Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA Stream compaction: understanding the concept

I am using CUDA/Thrust/CUDPP. As I understand, in Stream compaction, certain items in an array are marked as invalid and then "removed".

Now what does "removal" really mean here? Suppose the original array A and has length 6. If 2 elements are invalid (by whatever condition we may provide) then

  1. Does the system create a new array of size 4 in GPU-memory to store the valid elements to get the final result?

  2. OR does it physically remove the invalid elements from memory and shrink the original array A down to size 4 keeping only the valid elements?

For either case, doesn't that mean that dynamic memory allocation is happening under the hood? But I had heard that dynamic memory allocation is not possible in the CUDA world.

like image 292
smilingbuddha Avatar asked Oct 09 '22 07:10

smilingbuddha


1 Answers

First, dynamic memory allocation is possible in CUDA on Compute Capability 2.0 and higher devices. The CUDA runtime library supports malloc/free and new/delete in __device__ functions. But that is not germane to the answer, really.

Typically a large-enough output array is provided (pre-allocated, often the same size as the input array) and the output is written to it. No dynamic allocation required, but there is potentially storage waste. This is what CUDPP and thrust do. An alternative would be to perform a count of valid elements first, then allocate the output GPU memory dynamically using cudaMalloc called from the host CPU.

like image 190
harrism Avatar answered Oct 13 '22 10:10

harrism