CUDA Stream compaction: understanding the concept

Question

I am using CUDA/Thrust/CUDPP. As I understand, in Stream compaction, certain items in an array are marked as invalid and then "removed".

Now what does "removal" really mean here? Suppose the original array A and has length 6. If 2 elements are invalid (by whatever condition we may provide) then

Does the system create a new array of size 4 in GPU-memory to store the valid elements to get the final result?
OR does it physically remove the invalid elements from memory and shrink the original array A down to size 4 keeping only the valid elements?

For either case, doesn't that mean that dynamic memory allocation is happening under the hood? But I had heard that dynamic memory allocation is not possible in the CUDA world.

harrism · Accepted Answer

First, dynamic memory allocation is possible in CUDA on Compute Capability 2.0 and higher devices. The CUDA runtime library supports malloc/free and new/delete in __device__ functions. But that is not germane to the answer, really.

Typically a large-enough output array is provided (pre-allocated, often the same size as the input array) and the output is written to it. No dynamic allocation required, but there is potentially storage waste. This is what CUDPP and thrust do. An alternative would be to perform a count of valid elements first, then allocate the output GPU memory dynamically using cudaMalloc called from the host CPU.

CUDA Stream compaction: understanding the concept

Tags:

algorithm

cuda

gpu

thrust

cudpp

smilingbuddha

1 Answers

harrism

Recent Activity

Donate For Us

CUDA Stream compaction: understanding the concept

Tags:

algorithm

cuda

gpu

thrust

cudpp

smilingbuddha

1 Answers

harrism

Related questions

Recent Activity

Donate For Us