Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NVPTX generic memory space location in architecture

In NVPTX(LLVM IR) for CUDA programs, there're identifiers for memory address space from 0 to 5 (see Table below).

enter image description here

I saw in the same LLVM IR program, memory addresses are identified as 'Generic' or other types as shown in pictures.

For 'Generic' (by default, there's no identifier): enter image description here

For 'Shared': enter image description here

My question is that, for the generic memory address space, where is the data is actually located in hardware, off-chip, on-chip memory or local registers? Can someone explain how generic type of address space is finally managed?

like image 935
cache Avatar asked Dec 19 '22 00:12

cache


1 Answers

The answer is quite easy: Generic address space has no hardware representation.

You can see the generic address space (AS) as a logical AS where each of the other AS are combined. For example: the following kernel calls and a device function which accepts a pointer.

__device__ void bar(int* x){
   *x = *x + 1;
}

__global__ void foo(int* x){
   __shared__ int y[1];
   bar(x); 
   bar(y);
}

You can pass any pointer to the function. From the language point of view it does not mater if the pointer is in AS 1 (global) or AS 3 (shared). In C++ (and CUDA C/C++) you don't have to specify the AS explicitly. In OpenCL < 2.0 for example you have to explicitly add a modifier to each pointer and have to provide a function bar which takes the specific AS pointer.

What happens in LLVM IR is, that the pointer witch is passed to the function gets casted via an addresspacecast instruction to the generic AS. In PTX addresspacecast is represented by the cvta instruction:

// convert const, global, local, or shared address to generic address
cvta.space.size  p, a;        // source address in register a
cvta.space.size  p, var;      // get generic address of var
cvta.space.size  p, var+imm;  // generic address of var+offset

// convert generic address to const, global, local, or shared address
cvta.to.space.size  p, a;

.space = { .const, .global, .local, .shared };
.size  = { .u32, .u64 };

Generic pointers are mapped to the global memory unless it falls within an address region reserved for the other AS. The hardware subtracts the start address of the AS from the generic pointer to determine the correct memory region.

Atomics are a good example:

atom{.space}.op.type  d, [a], b;
atom{.space}.op.type  d, [a], b, c;

You can specify an address space or let the hardware choose. If you want to generate the correct atomic instruction without the pointer subtraction overhead, the back-end is responsible for casting the pointer back to the correct address space.

like image 198
Michael Haidl Avatar answered Dec 31 '22 14:12

Michael Haidl