I want to use __syncthreads() to a recursion like
__device__ void foo(int k) {
if (some_condition) {
for (int i=0;i<8;i++) {
foo(i+k); // foo might take longer with some inputs
__syncthreads();
}
}
}
How does this __syncthreads() now apply? I know it is only applied within a block. As far as I understand it, this holds for all local threads independently of the recursion depth? But what if I wanted to make sure that this __syncthreads() to a certain recursion depths? Is that even possible? I could check for the recursion depth, but I believe that won't work either.
Are there possible alternatives?
I've seen the that there are 3 syncthread extensions for CUDA Device >= 2.0
int __syncthreads_count(int predicate);
int __syncthreads_and(int predicate);
int __syncthreads_or(int predicate);
But I don't think they will help since they seem like an atomic counter.
As you know, __syncthreads()
is only safe where all threads within a block reach the barrier. This means that if you are calling __syncthreads()
from within a condition the condition must evaluate to the same on all threads within a block.
For __syncthreads()
within recursion, this means that all threads within a block must execute the recursion to the same depth, otherwise not all threads will be reaching the same barrier.
Are there possible alternatives?
Yes, don't use the recursion paradigm to express your function logic
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With