Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to differentiate between pointers to shared and global memory?

Tags:

cuda

In CUDA, given the value of a pointer, or the address of a variable, is there an intrinsic or another API which will introspect which address space the pointer refers to?

like image 413
Jared Hoberock Avatar asked May 21 '13 19:05

Jared Hoberock


People also ask

Is shared memory Global memory?

Shared memory is magnitudes faster to access than global memory. Its like a local cache shared among the threads of a block. No. Only global memory addresses can be passed to a kernel launched from host.

Is shared memory faster than global memory?

Size and BandwidthPer-block shared memory is faster than global memory and constant memory, but is slower than the per-thread registers. Each block has a maximum of 48k of shared memory for K20. Per-thread registers can only hold a small amount of data, but are the fastest.

What is a global memory?

Global memory is the main memory space and it is used to share data between host and GPU. Local memory is a particular type of memory that can be used to store data that does not fit in registers and is private to a thread.

Why is shared memory faster CUDA?

Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip. Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate.


1 Answers

The CUDA header file sm_20_intrinsics.h defines the function

__device__ unsigned int __isGlobal(const void *ptr)
{
  unsigned int ret;
  asm volatile ("{ \n\t"
                "    .reg .pred p; \n\t"
                "    isspacep.global p, %1; \n\t"
                "    selp.u32 %0, 1, 0, p;  \n\t"
#if (defined(_MSC_VER) && defined(_WIN64)) || defined(__LP64__)
                "} \n\t" : "=r"(ret) : "l"(ptr));
#else
                "} \n\t" : "=r"(ret) : "r"(ptr));
#endif

  return ret;
}

This function returns 1 if generic address ptr is in global memory space. It returns 0 if ptr is in shared, local or constant memory space.

The PTX instruction isspacep does the heavy lifting. It seems like we should be able to build the analogous function this way:

__device__ unsigned int __isShared(const void *ptr)
{
  unsigned int ret;
  asm volatile ("{ \n\t"
                "    .reg .pred p; \n\t"
                "    isspacep.shared p, %1; \n\t"
                "    selp.u32 %0, 1, 0, p;  \n\t"
#if (defined(_MSC_VER) && defined(_WIN64)) || defined(__LP64__)
                "} \n\t" : "=r"(ret) : "l"(ptr));
#else
                "} \n\t" : "=r"(ret) : "r"(ptr));
#endif

  return ret;
}
like image 194
Jared Hoberock Avatar answered Oct 13 '22 08:10

Jared Hoberock