I wrote a simple CUDA kernel for Box Filtering of an image.
texture<unsigned char,2> tex8u;
#define FILTER_SIZE 7
#define FILTER_OFFSET (FILTER_SIZE/2)
__global__ void box_filter_8u_c1(unsigned char* out, int width, int height, int pitch)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if(x>=width || y>=height) return;
float val = 0.0f;
for(int i = -FILTER_OFFSET; i<= FILTER_OFFSET; i++)
for(int j= -FILTER_OFFSET; j<= FILTER_OFFSET; j++)
val += tex2D(tex8u,x + i, y + j);
out[y * pitch + x] = static_cast<unsigned char>(val/(FILTER_SIZE * FILTER_SIZE));
}
The problem with the above code was that the top and left borders of the image were filtered incorrectly. They contained values from the bottom and right borders respectively. Width of the incorrect borders was equal to FILTER_OFFSET
.
But when I changed the x
and y
indices to int
instead of unsigned int
, the output was perfect.
Question: Why is it so?
P.S: The texture addressing mode is set to cudaAddressModeClamp
for both x and y directions.
The root cause of this has nothing to do with CUDA, it is basic C type conversion rules which are causing the result you see. The C99 standard says the following about how the conversion is performed:
6.3.1.8 Usual arithmetic conversions
- If both operands have the same type, then no further conversion is needed.
- Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
- Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
- Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
- Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
The third point implies that the signed integer (so i
and j
in this case) is first converted to an unsigned integer and the added to the unsigned integer (x
and y
). The result of converting a negative signed integer to an unsigned integer is implementation specific, but here, a straightforward two's complement representation will turn a small negative integer into a very large unsigned integer. The read mode of your texture clamps this out-of-range coordinate to the maximum allowable in the texture, and your kernel winds up reading from the wrong side of the texture.
If you use signed integers, no conversion occurs, and this whole problem disappears. The moral of this story is probably "know thy programming language".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With