I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages
:
1) class VNDetectTextRectanglesRequest
2) class VNTextObservation
It looks like they can detect characters, but I don't see a means to do anything with the characters. Once you've got characters detected, how would you go about turning them into something that can be interpreted by NSLinguisticTagger
?
Here's a post that is a brief overview of Vision
.
Thank you for reading.
SwiftOCR
I just got SwiftOCR to work with small sets of text.
https://github.com/garnele007/SwiftOCR
uses
https://github.com/Swift-AI/Swift-AI
which uses NeuralNet-MNIST model for text recognition.
TODO : VNTextObservation > SwiftOCR
Will post example of it using VNTextObservation once I have it one connected to the other.
OpenCV + Tesseract OCR
I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.
SEE ALSO : Google Vision iOS
Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.
https://developers.google.com/vision/text-overview
//Correction: just tried it but only Android version of the sdk supports text detection.
https://developers.google.com/vision/text-overview
If you subscribe to releases: https://libraries.io/cocoapods/GoogleMobileVision
Click SUBSCRIBE TO RELEASES you can see when TextDetection is added to the iOS part of the Cocoapod
This is how to do it ...
// // ViewController.swift // import UIKit import Vision import CoreML class ViewController: UIViewController { //HOLDS OUR INPUT var inputImage:CIImage? //RESULT FROM OVERALL RECOGNITION var recognizedWords:[String] = [String]() //RESULT FROM RECOGNITION var recognizedRegion:String = String() //OCR-REQUEST lazy var ocrRequest: VNCoreMLRequest = { do { //THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z) let model = try VNCoreMLModel(for:OCR().model) return VNCoreMLRequest(model: model, completionHandler: self.handleClassification) } catch { fatalError("cannot load model") } }() //OCR-HANDLER func handleClassification(request: VNRequest, error: Error?) { guard let observations = request.results as? [VNClassificationObservation] else {fatalError("unexpected result") } guard let best = observations.first else { fatalError("cant get best result")} self.recognizedRegion = self.recognizedRegion.appending(best.identifier) } //TEXT-DETECTION-REQUEST lazy var textDetectionRequest: VNDetectTextRectanglesRequest = { return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection) }() //TEXT-DETECTION-HANDLER func handleDetection(request:VNRequest, error: Error?) { guard let observations = request.results as? [VNTextObservation] else {fatalError("unexpected result") } // EMPTY THE RESULTS self.recognizedWords = [String]() //NEEDED BECAUSE OF DIFFERENT SCALES let transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y: (self.inputImage?.extent.size.height)!) //A REGION IS LIKE A "WORD" for region:VNTextObservation in observations { guard let boxesIn = region.characterBoxes else { continue } //EMPTY THE RESULT FOR REGION self.recognizedRegion = "" //A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0) for box in boxesIn { //SCALE THE BOUNDING BOX TO PIXELS let realBoundingBox = box.boundingBox.applying(transform) //TO BE SURE guard (inputImage?.extent.contains(realBoundingBox))! else { print("invalid detected rectangle"); return} //SCALE THE POINTS TO PIXELS let topleft = box.topLeft.applying(transform) let topright = box.topRight.applying(transform) let bottomleft = box.bottomLeft.applying(transform) let bottomright = box.bottomRight.applying(transform) //LET'S CROP AND RECTIFY let charImage = inputImage? .cropped(to: realBoundingBox) .applyingFilter("CIPerspectiveCorrection", parameters: [ "inputTopLeft" : CIVector(cgPoint: topleft), "inputTopRight" : CIVector(cgPoint: topright), "inputBottomLeft" : CIVector(cgPoint: bottomleft), "inputBottomRight" : CIVector(cgPoint: bottomright) ]) //PREPARE THE HANDLER let handler = VNImageRequestHandler(ciImage: charImage!, options: [:]) //SOME OPTIONS (TO PLAY WITH..) self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill //FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !! do { try handler.perform([self.ocrRequest]) } catch { print("Error")} } //APPEND RECOGNIZED CHARS FOR THAT REGION self.recognizedWords.append(recognizedRegion) } //THATS WHAT WE WANT - PRINT WORDS TO CONSOLE DispatchQueue.main.async { self.PrintWords(words: self.recognizedWords) } } func PrintWords(words:[String]) { // VOILA' print(recognizedWords) } func doOCR(ciImage:CIImage) { //PREPARE THE HANDLER let handler = VNImageRequestHandler(ciImage: ciImage, options:[:]) //WE NEED A BOX FOR EACH DETECTED CHARACTER self.textDetectionRequest.reportCharacterBoxes = true self.textDetectionRequest.preferBackgroundProcessing = false //FEED IT TO THE QUEUE FOR TEXT-DETECTION DispatchQueue.global(qos: .userInteractive).async { do { try handler.perform([self.textDetectionRequest]) } catch { print ("Error") } } } override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view, typically from a nib. //LETS LOAD AN IMAGE FROM RESOURCE let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too //WE NEED A CIIMAGE - NOT NEEDED TO SCALE inputImage = CIImage(image:loadedImage)! //LET'S DO IT self.doOCR(ciImage: inputImage!) } override func didReceiveMemoryWarning() { super.didReceiveMemoryWarning() // Dispose of any resources that can be recreated. } }
You'll find the complete project here included is the trained model !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With