Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using GPUImage and GPUImageHoughTransformLineDetector to detect highlighted text bounding box

I am using GPUImageHoughTransformLineDetector to try to detect the highlighted text in the image:

enter image description here

I am using the following code to try and detect the bounding blue box lines:

GPUImagePicture *stillImageSource = [[GPUImagePicture alloc] initWithImage:rawImage];
GPUImageHoughTransformLineDetector *lineFilter = [[GPUImageHoughTransformLineDetector alloc] init];
[stillImageSource addTarget:lineFilter];
GPUImageLineGenerator *lineDrawFilter = [[GPUImageLineGenerator alloc] init];
[lineDrawFilter forceProcessingAtSize:rawImage.size];

__weak typeof(self) weakSelf = self;
[lineFilter setLinesDetectedBlock:^(GLfloat *flt, NSUInteger count, CMTime time) {
    NSLog(@"Number of lines: %ld", (unsigned long)count);
    GPUImageAlphaBlendFilter *blendFilter = [[GPUImageAlphaBlendFilter alloc] init];
    [blendFilter forceProcessingAtSize:rawImage.size];
    [stillImageSource addTarget:blendFilter];
    [lineDrawFilter addTarget:blendFilter];

    [blendFilter useNextFrameForImageCapture];
    [lineDrawFilter renderLinesFromArray:flt count:count frameTime:time];
    weakSelf.doneProcessingImage([blendFilter imageFromCurrentFramebuffer]);
}];
[stillImageSource processImage];

Every time I run this regardless of the edgeThreshold or 1023 lines and the resulting output looks like:

enter image description here

It is unclear to me why changing the threshold does not do anything, but I am sure I am misunderstanding something. Anyone have any ideas on how to best do this?

like image 337
Ian Ownbey Avatar asked Dec 16 '14 21:12

Ian Ownbey


People also ask

What is the difference between draw_bounding_box Lambda and detect_text Lambda?

The detect_text lambda invokes draw_bounding_box lambda in RequestResponse mode, which means detect_text lambda waits for the response of draw_bounding_box lambda. The draw_bounding_box lambda function reads the image name and box dimensions from the event object.

How to perform bounding box regression for object detection?

In order to perform bounding box regression for object detection, all we need to do is adjust our network architecture: At the head of the network, place a fully-connected layer with four neurons, corresponding to the top-left and bottom-right (x, y)-coordinates, respectively.

What is text detection in image processing?

Text detection is the process of localizing where an image text is. You can think of text detection as a specialized form of object detection. In object detection, our goal is to (1) detect and compute the bounding box of all objects in an image and (2) determine the class label for each bounding box, similar to the image below:

Can TensorFlow/Keras and OpenCV predict bounding box coordinates from input image?

Using TensorFlow/Keras and OpenCV, we were able to detect the airplane and draw its bounding box. As you can see, our bounding box regressor has correctly localized the airplane in the input image, demonstrating that our object detection model actually learned how to predict bounding box coordinates just from the input image!


1 Answers

I just made some improvements to the Hough transform line detector in the framework that will help with this, but you're going to need to do some additional preprocessing to your image to pick out just that blue box.

Let me explain how this operation works. First, it detects edges in an image. For each pixel determined to be an edge (right now, I'm using a Canny edge detector for this), the coordinate of that pixel is extracted. Each of those coordinates is then used to draw a pair of lines in parallel coordinate space (based on the process described in "Real-Time Detection of Lines using Parallel Coordinates and OpenGL" by Dubská, et al.).

Pixels in parallel coordinate space where lines intersect will increase in intensity. The points of greatest intensity in parallel coordinate space indicate the presence of a line the real world scene.

However, only the pixels that are local maxima for intensity indicate real lines. The challenge is in determining local maxima to suppress noise from busy scenes. That's what I haven't totally solved in this operation. In your image above, the huge number of lines is due to a mess of points being above the detection threshold in parallel coordinate space, but not being properly removed for not being local maxima.

I did make some improvements, though, so I am getting a cleaner output from the operation now (I just did this quickly off a live video feed of my screen):

enter image description here

I fixed a bug in the local non-maximum suppression filter and expanded the area it works over from 3x3 to 5x5. It's still leaving behind a bunch of non-maximum points which contribute to noise, but it's much better.

You'll notice this still doesn't quite do what you want. It's picking up lines in the text, but not your box. That's because the black text on a white background produces very strong, very sharp edges at the edge detection stage, but the light blue selection box on a white background needs an extremely low threshold to even be picked up in any edge detection process.

If you're always going to be picking out a blue selection box, what I'd recommend is that you run a preprocessing operation to uniquely identify blue objects in the scene. A simple way to do this would be to define a custom filter that subtracts the red component from the blue for each pixel, flooring negative values and taking the result of that calculation as the output for the red, green, and blue channels. You might even want to multiply the result by 2.0-3.0 to intensify this difference.

The result of that should be an image where blue areas in your image show as white and everywhere else as black. That'll greatly improve the contrast around your selection box and make it easier to pick out from the text. You'll need to experiment with the right parameters to get this to be as reliable as you want in your case.

like image 114
Brad Larson Avatar answered Sep 29 '22 21:09

Brad Larson