Consider the following code:
__global__ void kernel(int *something) {
extern __shared__ int shared_array[];
// Some operations on shared_array here.
}
Is it possible to set whole shared_array to some value - e.g. 0 - without explicitly addressing each cell in some thread?
Size and BandwidthPer-block shared memory is faster than global memory and constant memory, but is slower than the per-thread registers. Each block has a maximum of 48k of shared memory for K20. Per-thread registers can only hold a small amount of data, but are the fastest.
Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip. Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate.
Shared memory is a CUDA memory space that is shared by all threads in a thread block. In this case shared means that all threads in a thread block can write and read to block-allocated shared memory, and all changes to this memory will be eventually available to all threads in the block.
array = (int *)shmat(shmid, 0, 0); array = malloc(sizeof(int)*count); Use different pointers if you want an array in user space also or remove the that malloc line. Shared memory will allocate the memory specified by you while creating it.
You can efficiently initialize shared arrays in parallel like this
// if SHARED_SIZE == blockDim.x, eliminate this loop
for (int i = threadIdx.x; i < SHARED_SIZE; i += blockDim.x)
shared_array[i] = INITIAL_VALUE;
__syncthreads();
No. Shared memory is uninitialised. You have to somehow initialise it yourself, one way or another...
From CUDA C Programming Guide 3.2, Section B.2.4.2, paragraph 2:
__shared__
variables cannot have an initialization as part of their declaration.
This also discards nontrivial default constructors for shared variables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With