UPDATE: Please see additional question below with more code;
I am trying to code a category for blurring an image. My starting point is Jeff LaMarche's sample here. Whilst this (after the fixes suggested by others) works fine, it is an order of magnitude too slow for my requirements - on a 3GS it takes maybe 3 seconds to do a decent blur and I'd like to get this down to under 0.5 sec for a full screen (faster is better).
He mentions the Accelerate framework as a performance enhancement so I've spent the last day looking at this, and in particular vDSP_f3x3 which according to the Apple Documenation
Filters an image by performing a two-dimensional convolution with a 3x3 kernel; single precision.
Perfect - I have a suitable filter matrix, and I have an image ... but this is where I get stumped.
vDSP_f3x3 assumes image data is (float *) but my image comes from;
srcData = (unsigned char *)CGBitmapContextGetData (context);
and the context comes from CGBitmapContextCreate with kCGImageAlphaPremultipliedFirst, so my srcData is really ARGB with 8 bits per component.
I suspect what I really need is a context with float components, but according to the Quartz documentation here, kCGBitMapFloatComponents is only available on Mac OS and not iOS :-(
Is there a really fast way using the accelerate framework of converting the integer components I have into the float components that vDSP_f3x3 needs? I mean I could do it myself, but by the time I do that, then the convolution, and then convert back, I suspect I'll have made it even slower than it is now since I might as well convolve as I go.
Maybe I have the wrong approach?
Does anyone have some tips for me having done some image processing on the iphone using vDSP? The documentation I can find is very reference orientated and not very newbie friendly when it comes to this sort of thing.
If anyone has a reference for really fast blurring (and high quality, not the reduce resolution and then rescale stuff I've seen and looks pants) that would be fab!
EDIT:
Thanks @Jason. I've done this and it is almost working, but now my problem is that although the image does blur, on every invocation it shifts left 1 pixel. It also seems to make the image black and white, but that could be something else.
Is there anything in this code that leaps out as obviously incorrect? I haven't optimised it yet and it's a bit rough, but hopefully the convolution code is clear enough.
CGImageRef CreateCGImageByBlurringImage(CGImageRef inImage, NSUInteger pixelRadius, NSUInteger gaussFactor)
{
unsigned char *srcData, *finalData;
CGContextRef context = CreateARGBBitmapContext(inImage);
if (context == NULL)
return NULL;
size_t width = CGBitmapContextGetWidth(context);
size_t height = CGBitmapContextGetHeight(context);
size_t bpr = CGBitmapContextGetBytesPerRow(context);
int componentsPerPixel = 4; // ARGB
CGRect rect = {{0,0},{width,height}};
CGContextDrawImage(context, rect, inImage);
// Now we can get a pointer to the image data associated with the bitmap
// context.
srcData = (unsigned char *)CGBitmapContextGetData (context);
if (srcData != NULL)
{
size_t dataSize = bpr * height;
finalData = malloc(dataSize);
memcpy(finalData, srcData, dataSize);
//Generate Gaussian kernel
float *kernel;
// Limit the pixelRadius
pixelRadius = MIN(MAX(1,pixelRadius), 248);
int kernelSize = pixelRadius * 2 + 1;
kernel = malloc(kernelSize * sizeof *kernel);
int gauss_sum =0;
for (int i = 0; i < pixelRadius; i++)
{
kernel[i] = 1 + (gaussFactor*i);
kernel[kernelSize - (i + 1)] = 1 + (gaussFactor * i);
gauss_sum += (kernel[i] + kernel[kernelSize - (i + 1)]);
}
kernel[(kernelSize - 1)/2] = 1 + (gaussFactor*pixelRadius);
gauss_sum += kernel[(kernelSize-1)/2];
// Scale the kernel
for (int i=0; i<kernelSize; ++i) {
kernel[i] = kernel[i]/gauss_sum;
}
float * srcAsFloat,* resultAsFloat;
srcAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);
resultAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);
// Convert uint source ARGB to floats
vDSP_vfltu8(srcData,1,srcAsFloat,1,width*height*componentsPerPixel);
// Convolve (hence the -1) with the kernel
vDSP_conv(srcAsFloat, 1, &kernel[kernelSize-1],-1, resultAsFloat, 1, width*height*componentsPerPixel, kernelSize);
// Copy the floats back to ints
vDSP_vfixu8(resultAsFloat, 1, finalData, 1, width*height*componentsPerPixel);
free(resultAsFloat);
free(srcAsFloat);
}
size_t bitmapByteCount = bpr * height;
CGDataProviderRef dataProvider = CGDataProviderCreateWithData(NULL, finalData, bitmapByteCount, &providerRelease);
CGImageRef cgImage = CGImageCreate(width, height, CGBitmapContextGetBitsPerComponent(context),
CGBitmapContextGetBitsPerPixel(context), CGBitmapContextGetBytesPerRow(context), CGBitmapContextGetColorSpace(context), CGBitmapContextGetBitmapInfo(context),
dataProvider, NULL, true, kCGRenderingIntentDefault);
CGDataProviderRelease(dataProvider);
CGContextRelease(context);
return cgImage;
}
I should add that if I comment out the vDSP_conv line, and change the line following to;
vDSP_vfixu8(srcAsFloat, 1, finalData, 1, width*height*componentsPerPixel);
Then as expected, my result is a clone of the original source. In colour and not shifted left. This implies to me that it IS the convolution that is going wrong, but I can't see where :-(
THOUGHT: Actually thinking about this, it seems to me that the convolve needs to know the input pixels are in ARGB format as otherwise the convolution will be multiplying the values together with no knowledge about their meaning (ie it will multiple R * B etc). This would explain why I get a B&W result I think, but not the shift. Again, I think there might need to be more to it than my naive version here ...
FINAL THOUGHT: I think the shifting left is a natural result of the filter and I need to look at the image dimensions and possibly pad it out ... so I think the code is actually working OK given what I've fed it.
While the Accelerate framework will be faster than simple serial code, you'll probably never see the greatest performance by blurring an image using it.
My suggestion would be to use an OpenGL ES 2.0 shader (for devices that support this API) to do a two-pass box blur. Based on my benchmarks, the GPU can handle these kinds of image manipulation operations at 14-28X the speed of the CPU on an iPhone 4, versus the maybe 4.5X that Apple reports for the Accelerate framework in the best cases.
Some code for this is described in this question, as well as in the "Post-Processing Effects on Mobile Devices" chapter in the GPU Pro 2 book (for which the sample code can be found here). By placing your image in a texture, then reading values in between pixels, bilinear filtering on the GPU gives you some blurring for free, which can then be combined with a few fast lookups and averaging operations.
If you need a starting project to feed images into the GPU for processing, you might be able to use my sample application from the article here. That sample application passes AVFoundation video frames as textures into a processing shader, but you can modify it to send in your particular image data and run your blur operation. You should be able to use my glReadPixels()
code to then retrieve the blurred image for later use.
Since I originally wrote this answer, I've created an open source image and video processing framework for doing these kinds of operations on the GPU. The framework has several different blur types within it, all of which can be applied very quickly to images or live video. The GPUImageGaussianBlurFilter, which applies a standard 9-hit Gaussian blur, runs in 16 ms for a 640x480 frame of video on the iPhone 4. The GPUImageFastBlurFilter is a modified 9-hit Gaussian blur that uses hardware filtering, and it runs in 2.0 ms for that same video frame. Likewise, there's a GPUImageBoxBlurFilter that uses a 5-pixel box and runs in 1.9 ms for the same image on the same hardware. I also have median and bilateral blur filters, although they need a little performance tuning.
In my benchmarks, Accelerate doesn't come close to these kinds of speeds, especially when it comes to filtering live video.
You definitely want to convert to float
to perform the filtering since that is what the accelerated functions take, plus it is a lot more flexible if you want to do any additional processing. The computation time of a 2-D convolution (filter) will most likely dwarf any time spent in conversion. Take a look at the function vDSP_vfltu8()
which will quickly convert the uint8 data to float. vDSP_vfixu8()
will convert it back to uint8.
To perform a blur, you are probably going to want a bigger convolution kernel than 3x3 so I would suggest using the function vDSP_imgfir()
which will take any kernel size.
Response to edit:
A few things:
You need to perform the filtering on each color channel independently. That is, you need to split the R, G, and B components into their own images (of type float), filter them, then remultiplex them into the ARGB image.
vDSP_conv
computes a 1-D convolution, but to blur an image, you really need a 2-D convolution. vDSP_imgfir
essentially computes the 2-D convolution. For this you will need a 2-D kernel as well. You can look up the formula for a 2-D Gaussian function to produce the kernel.
Note: You actually can perform a 2-D convolution using 1-D convolutions if your kernel is seperable (which Gaussian is). I won't go into what that means, but you essentially have to perform 1-D convolution across the columns and then perform 1-D convolution across the resulting rows. I would not go this route unless you know what you are doing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With