I am doing the following:
__shared__ int exForBlockLessThanP = totalElementLessThanPivotEntireBlock[blockIdx.x];
where totalElementLessThanPivotEntireBlock is an array on GPU. The compiler is throwing as error as stated in the title of the question. I really dont understand why this is a problem?
Static initialization of shared variables is illegal in CUDA. The problem is that the semantics of how every thread should treat static initialization of shared memory is undefined in the programming model. Which thread should do the write? What happens if the value is not uniform between threads? How should the compiler emit code for such a case and how should the hardware run it?
In your nonsensical example you are asking every thread in the block to initialize the same shared variable with a value -- basically a statically compiled memory race.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With