The CUDA NPP library supports filtering of image using the nppiFilter_8u_C1R command but keep getting errors. I have no problem getting the boxFilterNPP sample code up and running.
eStatusNPP = nppiFilterBox_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI, oMaskSize, oAnchor);
But if I change it to use nppiFilter_8u_C1R instead, eStatusNPP return the error -24 (NPP_TEXTURE_BIND_ERROR). The code below is the alterations I made to the original boxFilterNPP sample.
NppiSize oMaskSize = {5,5};
npp::ImageCPU_32s_C1 hostKernel(5,5);
for(int x = 0 ; x < 5; x++){
for(int y = 0 ; y < 5; y++){
hostKernel.pixels(x,y)[0].x = 1;
}
}
npp::ImageNPP_32s_C1 pKernel(hostKernel);
Npp32s nDivisor = 1;
eStatusNPP = nppiFilter_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI,
pKernel.data(),
oMaskSize, oAnchor,
nDivisor);
This have been tried on CUDA 4.2 and 5.0, with same result.
The code runs with the expected result when oMaskSize = {1,1}
Filter applies the mask extending upward and to the left, following the mathematical convention that the convolution between two functions reverses the direction of the second function.
The box filter mask extends downwards and to the right, which is probably more intuitive.
In any case, the problem is caused by the fact that the input image in the changed code would have to be sampled at what would effectively be SOURCE[-4, -4) in order to compute DESTINATION[0, 0]. Since the input image is being accessed via a texture sampler, binding the source image pointer offset by (-4, -4) causes the texture-bind error you're seeing.
Workaround: The simplest workaround for this issue would be to set the anchor point to (4, 4), which would effectively move the mask down and to the right. You still need to be aware that you'd want to invert the weights in the kernel array (i.e. K[-4, -4] -> K[0, 0]
, K[0, 0] -> K[-4, -4]
, etc.).
I had the same problem when I stored my kernel as an ImageCPU
/ImageNPP
.
A good solution is to store the kernel as a traditional 1D array on the device. I tried this, and it gave me good results (and none of those unpredictable or garbage images).
Thanks to Frank Jargstorff in this StackOverflow post for the 1D idea.
NppiSize oMaskSize = {5,5};
Npp32s hostKernel[5*5];
for(int x = 0 ; x < 5; x++){
for(int y = 0 ; y < 5; y++){
hostKernel[x*5+y] = 1;
}
}
Npp32s* pKernel; //just a regular 1D array on the GPU
cudaMalloc((void**)&pKernel, 5 * 5 * sizeof(Npp32s));
cudaMemcpy(pKernel, hostKernel, 5 * 5 * sizeof(Npp32s), cudaMemcpyHostToDevice);
Using this original image, here's the blurred result that I get from your code with the 1D kernel array:
Other parameters that I used:
Npp32s nDivisor = 25;
NppiPoint oAnchor = {4, 4};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With