I want to use VNDetectTextRectanglesRequest
from a Vision framework to detect regions in an image containing only one character, number '9', with the white background. I'm using following code to do this:
private func performTextDetection() {
let textRequest = VNDetectTextRectanglesRequest(completionHandler: self.detectTextHandler)
textRequest.reportCharacterBoxes = true
textRequest.preferBackgroundProcessing = false
let handler = VNImageRequestHandler(cgImage: loadedImage.cgImage!, options: [:])
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([textRequest])
} catch {
print ("Error")
}
}
}
func detectTextHandler(request: VNRequest, error: Error?) {
guard let observations = request.results, !observations.isEmpty else {
fatalError("no results")
}
print("there is result")
}
Number of observations results I get is 0, however if I provide an image with text '123' on black background, '123' is detected as a region with text. The described problem also occurs for 2 digit numbers, '22' on white background also doesn't get detected.
Why does a Vision API detect only 3 digits+ numbers on white background in my case?
Long characters continue to be a problem for VNRecognizeTextRequest and VNDetectTextRectanglesRequest in XCode 12.5 and Swift 5.
I've seen VNDetectTextRectanglesRequest find virtually all the individual words on a sheet of paper, but fail to detect lone characters [when processing the entire image]. Setting the property VNDetectTextRectanglesRequest.regionOfInterest to a smaller region may help.
What has worked for me is to have the single characters occupy more of the region of interest (ROI) for VNRecognizeTextRequest. I tested single characters at a variety of heights, and it became clear that single characters would start reading once they reached a certain size within the ROI.
For some single characters, detection seems to occur when the ROI is roughly three times the width and three times the height of the character itself. That's a rather tight region of interest. Placing it correctly is another problem, but also solvable.
If processing time isn't an issue for your application, you can create an array [CGRect] spanning a region suspected to contain lone characters.
My suspicion is that when VNRecognizeTextRequest performs an initial check for edge content, edge density, and/or image features that resemble strokes, it exits early if it doesn't find enough candidates. That initial check may simply be an embedded VNDetectTextRectanglesRequest. Whatever the initial check is, it runs fast, so I don't imagine it's that complicated.
For more about stroke detection to find characters, search for SO posts and articles about the Stroke Width Transform. Also this: https://www.microsoft.com/en-us/research/publication/detecting-text-in-natural-scenes-with-stroke-width-transform/. The SWT is meant to work on "natural" images, such as text seen outdoors.
There are some hacks to get around the problem. Some of these hacks are unpleasant, but for a particular application they may be worth it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With