Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apple Vision – Can't recognize a single number as region

I want to use VNDetectTextRectanglesRequest from a Vision framework to detect regions in an image containing only one character, number '9', with the white background. I'm using following code to do this:

 private func performTextDetection() {
    let textRequest = VNDetectTextRectanglesRequest(completionHandler: self.detectTextHandler)
    textRequest.reportCharacterBoxes = true
    textRequest.preferBackgroundProcessing = false

    let handler = VNImageRequestHandler(cgImage: loadedImage.cgImage!, options: [:])

    DispatchQueue.global(qos: .userInteractive).async {
        do {
            try handler.perform([textRequest])
        } catch {
            print ("Error")
        }
    }
}

func detectTextHandler(request: VNRequest, error: Error?) {
    guard let observations = request.results, !observations.isEmpty else {
        fatalError("no results")
    }

    print("there is result")
}

Number of observations results I get is 0, however if I provide an image with text '123' on black background, '123' is detected as a region with text. The described problem also occurs for 2 digit numbers, '22' on white background also doesn't get detected.

Why does a Vision API detect only 3 digits+ numbers on white background in my case?

like image 844
AndrzejZ Avatar asked Jan 06 '18 11:01

AndrzejZ


1 Answers

Long characters continue to be a problem for VNRecognizeTextRequest and VNDetectTextRectanglesRequest in XCode 12.5 and Swift 5.

I've seen VNDetectTextRectanglesRequest find virtually all the individual words on a sheet of paper, but fail to detect lone characters [when processing the entire image]. Setting the property VNDetectTextRectanglesRequest.regionOfInterest to a smaller region may help.

What has worked for me is to have the single characters occupy more of the region of interest (ROI) for VNRecognizeTextRequest. I tested single characters at a variety of heights, and it became clear that single characters would start reading once they reached a certain size within the ROI.

For some single characters, detection seems to occur when the ROI is roughly three times the width and three times the height of the character itself. That's a rather tight region of interest. Placing it correctly is another problem, but also solvable.

If processing time isn't an issue for your application, you can create an array [CGRect] spanning a region suspected to contain lone characters.

My suspicion is that when VNRecognizeTextRequest performs an initial check for edge content, edge density, and/or image features that resemble strokes, it exits early if it doesn't find enough candidates. That initial check may simply be an embedded VNDetectTextRectanglesRequest. Whatever the initial check is, it runs fast, so I don't imagine it's that complicated.

For more about stroke detection to find characters, search for SO posts and articles about the Stroke Width Transform. Also this: https://www.microsoft.com/en-us/research/publication/detecting-text-in-natural-scenes-with-stroke-width-transform/. The SWT is meant to work on "natural" images, such as text seen outdoors.

There are some hacks to get around the problem. Some of these hacks are unpleasant, but for a particular application they may be worth it.

  • Create a grid of small regions of interest (ROIs). Run the text request on one ROI after the other.
  • As a cheap substitute for VNDetectTextRectanglesRequest, look for regions of the image with edge content that suggests a single character may be present. If nothing else, this could help ignore regions where there is no edge content.
  • Try use a scaling filter to scale up the image before processing it. That could ensure single characters are big enough to read. (For CIFilters, a very handy resource is https://cifilter.io/)
  • Run multiple passes on your image. First, run OCR on the full image. Then get the bounding boxes for words that were read. Search for suspicious gaps between boxes. Run grids of small ROIs on the suspiciously blank regions.
  • Use Tesseract as a backup. (https://www.seemuapps.com/swift-optical-character-recognition-tutorial)
like image 66
Rethunk Avatar answered Oct 26 '22 12:10

Rethunk