Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Casting float* to char* while looping over a 2-D array in linear memory on device

Tags:

c++

pointers

cuda

On Page 21 of the CUDA 4.0 programming guide there is an example (given below) to illustrate looping over the elements of a 2D array of floats in device memory. The dimensions of the 2D are width*height

// Host code
int width = 64, height = 64;
float* devPtr;
size_t pitch;
cudaMallocPitch(&devPtr, &pitch,
width * sizeof(float), height);
MyKernel<<<100, 512>>>(devPtr, pitch, width, height);


// Device code
__global__ void MyKernel(float* devPtr, size_t pitch, int width, int height)
{
   for (int r = 0; r < height; ++r) 
    {
       float* row = (float*)((char*)devPtr + r * pitch);
          for (int c = 0; c < width; ++c) 
              {
              float element = row[c];
              }
     }
}

Why has the devPtr device memory pointer been cast to a character pointer ,char*, in the global kernel function? Can someone explain that line please. It looks a bit weird.

like image 436
smilingbuddha Avatar asked Jan 18 '23 20:01

smilingbuddha


1 Answers

This is due to the way pointer arithmetic works in C. When you add an integer x to a pointer p, it doesn't always add x bytes. It adds x times sizeof(*p) (the size of the object to which p points).

float* row = (float*)((char*)devPtr + r * pitch);

By casting devPtr to a char*, the offset that is applied (r * pitch*) is in number of 1-byte increments. (because a char is one byte). Had the cast not been there, the offset applied to devPtr would be r * pitch times 4 bytes, as a float is four bytes.

For example, if we have:

float* devPtr = 1000;
int r = 4;

Now, let's leave out the cast:

float* result1 = (devPtr + r);
// result1 = devPtr + (r * sizeof(float)) = 1016;

Now, if we include the cast:

float* result2 = (float*)((char*)devPtr + r);
// result2 = devPtr + (r * sizeof(char)) = 1004;
like image 134
Jonathon Reinhart Avatar answered Jan 20 '23 09:01

Jonathon Reinhart