I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.
The kerne code i am invoking on the GPU is as follows.
__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{
int absolute_image_position_x = blockIdx.x;
int absolute_image_position_y = blockIdx.y;
if ( absolute_image_position_x >= numCols ||
absolute_image_position_y >= numRows )
{
return;
}
uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;
}
void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage,
uchar4 * const d_rgbaImage,
unsigned char* const d_greyImage,
size_t numRows,
size_t numCols)
{
//You must fill in the correct sizes for the blockSize and gridSize
//currently only one block with one thread is being launched
const dim3 blockSize(numCols/32, numCols/32 , 1); //TODO
const dim3 gridSize(numRows/12, numRows/12 , 1); //TODO
rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage,
d_greyImage,
numRows,
numCols);
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}
i see a line of dots in the first pixel line.
error i am getting is
libdc1394 error: Failed to initialize libdc1394
Difference at pos 51 exceeds tolerance of 5
Reference: 255
GPU : 0
my input/output images
Can anyone help me with this??? thanks in advance.
I recently joined this course and tried your solution but it don't work so, i tried my own. You are almost correct. The correct solution is this:
__global__`
void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{`
int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(pos_x >= numCols || pos_y >= numRows)
return;
uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z);
}
The rest is same as your code.
Now, since I posted this question I have been continuously working on this problem
there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
Changes to be done:-
1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;
Secondly,
1. const dim3 blockSize(24, 24, 1);
2. const dim3 gridSize((numCols/16), (numRows/16) , 1);
In the solution we are using a grid of numCols/16 * numCols/16
and blocksize of 24 * 24
code executed in 0.040576 ms
@datenwolf : thanks for answering above!!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With