Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shuffle instruction in CUDA not working

Tags:

c++

shuffle

cuda

I have got problem with shuffle instruction in CUDA 5.0.

This is snippet of my kernel. It is inside the loop. Print is there only for debug purpose because I can't use ordinary debugger:

...
tex_val = tex2D(srcTexRef, threadIdx.x + w, y_pos);
if (threadIdx.x == 0)
{
    left = left_value[y_pos];
}
else
{
    printf("thread %d; shfl value: %f \n", threadIdx.x, __shfl_up(value, 1));
    left = __shfl_up(value, 1);
}

printf("thread %d; value: %f; tex_val: %f; left: %f \n", threadIdx.x, value, tex_val, left);
...

From that I get this output:

l0:  ITERATION 1
l1:  thread 0; value: 0; tex_val: 1; left: 4
l2: 
l3:  ITERATION 2
l4:  thread 1; shfl value: 0
l5:  thread 0; value: 5; tex_val: 1; left: 5
l6:  thread 1; value: 0; tex_val: 1; left: 0
l7: 
l8:  ITERATION 3
l9:  thread 1; shfl value: 0
l10: thread 2; shfl value: 1
l11: thread 0; value: 6; tex_val: 1; left: 6
l12: thread 1; value: 1; tex_val: 1; left: 0
l13: thread 2; value: 2; tex_val: 1; left: 1
...

From the output I can see that thread 1 doesn't get value from thread 0 in any iteration even though I can clearly see that it has value (line 4 - shfl value is 0; line 5 - value is 5). Thread 2 and higher can get value from lower thread. Where am I making mistake? Is it happening because of the branching?

like image 325
benderto Avatar asked Mar 16 '23 03:03

benderto


1 Answers

Yes, it's because of the branching. Quoting from the CUDA programming guide B.14.2:

The __shfl() intrinsics permit exchanging of a variable between threads within a warp without use of shared memory. The exchange occurs simultaneously for all active threads within the warp, ...

and

Threads may only read data from another thread which is actively participating in the __shfl() command. If the target thread is inactive, the retrieved value is undefined.

In a branch, active threads are those taking the same path of execution, while those taking different ones are inactive. In your case, thread 0 is inactive, so you cannot shuffle from it.

like image 186
Angew is no longer proud of SO Avatar answered Mar 25 '23 03:03

Angew is no longer proud of SO