Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion over grid and block dimensions

Tags:

cuda

I am trying to solve the problem at the end of lesson 1 of the Udacity course but I'm not sure if I have just made a typo or if the actual code is wrong.

void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage, unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
    size_t totalPixels = numRows * numCols;
    size_t gridRows = totalPixels / 32;
    size_t gridCols = totalPixels / 32;
    const dim3 blockSize(32,32,1);
    const dim3 gridSize(gridCols,gridRows,1);
    rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);
    cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

The other method is:

void rgba_to_greyscale(const uchar4* const rgbaImage, unsigned char* const greyImage, int numRows, int numCols)
{   
    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
    uchar4 rgba = rgbaImage[x * numCols + y];
    float channelSum = 0.299f * rgba.x + 0.587f * rgba.y + 0.114f * rgba.z;
    greyImage[x * numCols + y] = channelSum;
}

Error message says the following:

libdc1394 error: failed to initialize libdc1394
Cuda error at student_func.cu:76
unspecified launch failure cudaGetLastError()
we were unable to execute your code. Did you set the grid and/or block size correctly?

But then, it says that the code has compiled,

Your code compiled!
error output: libdc1394 error: Failed to initialize libdc1394
Cuda error at student_func.cu:76
unspecified launch failure cudaGetLastError()

Line 76 is the last line in the first code block and as far as I'm aware I haven't changed anything in it. Line 76 is as follows,

rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);

I can't actually find the declaration of cudaGetLastError().

I'm mainly concerned with my understanding on setting up the grid/block dimensions + whether the first methods approach was right with regards to mapping between a 1D array of pixel positions and my threads.

EDIT: I guess I've misunderstood something. Is numRows the number of pixels in the vertical? And is numCols the pixels in horizontal direction?

My block is made up of 8 x 8 threads, where each thread represents 1 pixel? If so, I'm assuming that's why I had to divide by 4 when calculating gridRows since the image is not square? I'm assuming I could have also made a block that was 2:1 columns : rows?

Screen shot

EDIT 2: I just tried to change my block so that it was 2:1 ratio, so I could then divide numRows and numCol by the same number but its now showing blank areas at the bottom and side. Why is there blank areas both at the bottom and side. I haven't changed the y dimensions of by grid or block.

enter image description here

like image 993
Hans Rudel Avatar asked Jun 13 '13 11:06

Hans Rudel


People also ask

What is block dimension in CUDA?

' The maximum x, y and z dimensions of a block are 1024, 1024 and 64, and it should be allocated such that x × y × z ≤ 1024, which is the maximum number of threads per block.

How do you determine the number of threads blocks and grid in CUDA call?

Choosing the number of threads per block is very complicated. Most CUDA algorithms admit a large range of possibilities, and the choice is based on what makes the kernel run most efficiently. It is almost always a multiple of 32, and at least 64, because of how the thread scheduling hardware works.

What is dim3 CUDA?

dim3 is an integer vector type that can be used in CUDA code. Its most common application is to pass the grid and block dimensions in a kernel invocation. It can also be used in any user code for holding values of 3 dimensions.


1 Answers

each blocks processes 32*32 pixels, and there are (totalPixels / 32) * (totalPixels / 32) blocks, so you process totalPixels ^ 2 pixels - that seems wrong

1st was wrong, this should be the correct one:

const dim3 blockSize(32,32,1);

size_t gridCols = (numCols + blockSize.x - 1) / blockSize.x;
size_t gridRows = (numRows + blockSize.y - 1) / blockSize.y;

it is a pretty common pattern for 2d - you can remember it

in the sample image size is not power of two and you want block to cover all your image(or even more)

so next must be correct: gridCols * blockSize.x >= numCols gridRows * blockSize.y >= numRows

you choose block size and basing on it you compute amount of blocks you need to cover all image

after that, in the kernel, you must check that you are not 'out of image', for cases with bad size

another problem is in the kernel, it must be (y * numCols + x), not oposite

kernel:

int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;

if(x < numCols && y < numRows)
{
    uchar4 rgba = rgbaImage[y * numCols + x];
    float channelSum = 0.299f * rgba.x + 0.587f * rgba.y + 0.114f * rgba.z;
    greyImage[y * numCols + x] = channelSum;
}

calling code:

const dim3 blockSize(4,32,1); // may be any

size_t gridCols = (numCols + blockSize.x - 1) / blockSize.x;
size_t gridRows = (numRows + blockSize.y - 1) / blockSize.y;

const dim3 gridSize(gridCols,gridRows,1);
rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);
cudaDeviceSynchronize(); 
checkCudaErrors(cudaGetLastError());

damn it, i feel like i am doing things even harder to understand (

like image 188
ShPavel Avatar answered Sep 24 '22 23:09

ShPavel