Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA __syncthreads() and recursion

Tags:

recursion

cuda

I want to use __syncthreads() to a recursion like

__device__ void foo(int k) {
  if (some_condition) {
    for (int i=0;i<8;i++) { 
       foo(i+k); // foo might take longer with some inputs
       __syncthreads();
    }
  }
}

How does this __syncthreads() now apply? I know it is only applied within a block. As far as I understand it, this holds for all local threads independently of the recursion depth? But what if I wanted to make sure that this __syncthreads() to a certain recursion depths? Is that even possible? I could check for the recursion depth, but I believe that won't work either.

Are there possible alternatives?

I've seen the that there are 3 syncthread extensions for CUDA Device >= 2.0

int __syncthreads_count(int predicate);
int __syncthreads_and(int predicate);
int __syncthreads_or(int predicate);

But I don't think they will help since they seem like an atomic counter.

like image 834
Pascal Avatar asked Jan 20 '23 04:01

Pascal


2 Answers

As you know, __syncthreads() is only safe where all threads within a block reach the barrier. This means that if you are calling __syncthreads() from within a condition the condition must evaluate to the same on all threads within a block.

For __syncthreads() within recursion, this means that all threads within a block must execute the recursion to the same depth, otherwise not all threads will be reaching the same barrier.

like image 110
Tom Avatar answered Jan 30 '23 22:01

Tom


Are there possible alternatives?

Yes, don't use the recursion paradigm to express your function logic

like image 30
fabrizioM Avatar answered Jan 30 '23 23:01

fabrizioM