CUDA __syncthreads() and recursion

Question

I want to use __syncthreads() to a recursion like

__device__ void foo(int k) {
  if (some_condition) {
    for (int i=0;i<8;i++) { 
       foo(i+k); // foo might take longer with some inputs
       __syncthreads();
    }
  }
}

How does this __syncthreads() now apply? I know it is only applied within a block. As far as I understand it, this holds for all local threads independently of the recursion depth? But what if I wanted to make sure that this __syncthreads() to a certain recursion depths? Is that even possible? I could check for the recursion depth, but I believe that won't work either.

Are there possible alternatives?

I've seen the that there are 3 syncthread extensions for CUDA Device >= 2.0

int __syncthreads_count(int predicate);
int __syncthreads_and(int predicate);
int __syncthreads_or(int predicate);

But I don't think they will help since they seem like an atomic counter.

Tom · Accepted Answer

As you know, __syncthreads() is only safe where all threads within a block reach the barrier. This means that if you are calling __syncthreads() from within a condition the condition must evaluate to the same on all threads within a block.

For __syncthreads() within recursion, this means that all threads within a block must execute the recursion to the same depth, otherwise not all threads will be reaching the same barrier.

fabrizioM · Answer

Are there possible alternatives?

Yes, don't use the recursion paradigm to express your function logic

CUDA __syncthreads() and recursion

Tags:

recursion

cuda

Pascal

2 Answers

Tom

fabrizioM

Recent Activity

Donate For Us

CUDA __syncthreads() and recursion

Tags:

recursion

cuda

Pascal

2 Answers

Tom

fabrizioM

Related questions

Recent Activity

Donate For Us