Finding max value in CUDA

Question

I am trying to write a code in CUDA for finding the max value for the given set of numbers.

Assume you have 20 numbers, and the kernel is running on 2 blocks of 5 threads. Now assume the 10 threads compare the first 10 values at the same time, and thread 2 finds a max value, so thread 2 is updating the max value variable in global memory. While thread 2 is updating, what will happen to the remaining threads (1,3-10) that will be comparing using the old value?

If I lock the global variable using atomicCAS(), will the threads (1,3-10) compare using the old max value? How can I overcome this problem?

jwdmsd · Accepted Answer

This is a purely a reduction problem. Here's a good presentation by NVIDIA for optimizing reduction on GPUs. You can use the same technique to either find the minimum, maximum or sum of all elements.

Finding max value in CUDA

Tags:

parallel-processing

cuda

reduction

kar

1 Answers

jwdmsd

Recent Activity

Donate For Us

Finding max value in CUDA

Tags:

parallel-processing

cuda

reduction

kar

1 Answers

jwdmsd

Related questions

Recent Activity

Donate For Us