Converting a Vision VNTextObservation to a String

Question

I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages:

1) class VNDetectTextRectanglesRequest

2) class VNTextObservation

It looks like they can detect characters, but I don't see a means to do anything with the characters. Once you've got characters detected, how would you go about turning them into something that can be interpreted by NSLinguisticTagger?

Here's a post that is a brief overview of Vision.

Thank you for reading.

brian.clear · Accepted Answer

SwiftOCR

I just got SwiftOCR to work with small sets of text.

https://github.com/garnele007/SwiftOCR

uses

https://github.com/Swift-AI/Swift-AI

which uses NeuralNet-MNIST model for text recognition.

TODO : VNTextObservation > SwiftOCR

Will post example of it using VNTextObservation once I have it one connected to the other.

OpenCV + Tesseract OCR

I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.

SEE ALSO : Google Vision iOS

Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.

https://developers.google.com/vision/text-overview

//Correction: just tried it but only Android version of the sdk supports text detection.

https://developers.google.com/vision/text-overview

If you subscribe to releases: https://libraries.io/cocoapods/GoogleMobileVision

Click SUBSCRIBE TO RELEASES you can see when TextDetection is added to the iOS part of the Cocoapod

DrNeurosurg · Answer

This is how to do it ...

    // //  ViewController.swift //   import UIKit import Vision import CoreML  class ViewController: UIViewController {      //HOLDS OUR INPUT     var  inputImage:CIImage?      //RESULT FROM OVERALL RECOGNITION     var  recognizedWords:[String] = [String]()      //RESULT FROM RECOGNITION     var recognizedRegion:String = String()       //OCR-REQUEST     lazy var ocrRequest: VNCoreMLRequest = {         do {             //THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z)             let model = try VNCoreMLModel(for:OCR().model)             return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)         } catch {             fatalError("cannot load model")         }     }()      //OCR-HANDLER     func handleClassification(request: VNRequest, error: Error?)     {         guard let observations = request.results as? [VNClassificationObservation]             else {fatalError("unexpected result") }         guard let best = observations.first             else { fatalError("cant get best result")}          self.recognizedRegion = self.recognizedRegion.appending(best.identifier)     }      //TEXT-DETECTION-REQUEST     lazy var textDetectionRequest: VNDetectTextRectanglesRequest = {         return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection)     }()      //TEXT-DETECTION-HANDLER     func handleDetection(request:VNRequest, error: Error?)     {         guard let observations = request.results as? [VNTextObservation]             else {fatalError("unexpected result") }         // EMPTY THE RESULTS         self.recognizedWords = [String]()          //NEEDED BECAUSE OF DIFFERENT SCALES         let  transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y:  (self.inputImage?.extent.size.height)!)          //A REGION IS LIKE A "WORD"         for region:VNTextObservation in observations         {             guard let boxesIn = region.characterBoxes else {                 continue             }              //EMPTY THE RESULT FOR REGION             self.recognizedRegion = ""              //A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0)             for box in boxesIn             {                 //SCALE THE BOUNDING BOX TO PIXELS                 let realBoundingBox = box.boundingBox.applying(transform)                  //TO BE SURE                 guard (inputImage?.extent.contains(realBoundingBox))!                     else { print("invalid detected rectangle"); return}                  //SCALE THE POINTS TO PIXELS                 let topleft = box.topLeft.applying(transform)                 let topright = box.topRight.applying(transform)                 let bottomleft = box.bottomLeft.applying(transform)                 let bottomright = box.bottomRight.applying(transform)                  //LET'S CROP AND RECTIFY                 let charImage = inputImage?                     .cropped(to: realBoundingBox)                     .applyingFilter("CIPerspectiveCorrection", parameters: [                         "inputTopLeft" : CIVector(cgPoint: topleft),                         "inputTopRight" : CIVector(cgPoint: topright),                         "inputBottomLeft" : CIVector(cgPoint: bottomleft),                         "inputBottomRight" : CIVector(cgPoint: bottomright)                         ])                  //PREPARE THE HANDLER                 let handler = VNImageRequestHandler(ciImage: charImage!, options: [:])                  //SOME OPTIONS (TO PLAY WITH..)                 self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill                  //FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !!                 do {                     try handler.perform([self.ocrRequest])                 }  catch { print("Error")}              }              //APPEND RECOGNIZED CHARS FOR THAT REGION             self.recognizedWords.append(recognizedRegion)         }          //THATS WHAT WE WANT - PRINT WORDS TO CONSOLE         DispatchQueue.main.async {             self.PrintWords(words: self.recognizedWords)         }     }      func PrintWords(words:[String])     {         // VOILA'         print(recognizedWords)      }      func doOCR(ciImage:CIImage)     {         //PREPARE THE HANDLER         let handler = VNImageRequestHandler(ciImage: ciImage, options:[:])          //WE NEED A BOX FOR EACH DETECTED CHARACTER         self.textDetectionRequest.reportCharacterBoxes = true         self.textDetectionRequest.preferBackgroundProcessing = false          //FEED IT TO THE QUEUE FOR TEXT-DETECTION         DispatchQueue.global(qos: .userInteractive).async {             do {                 try  handler.perform([self.textDetectionRequest])             } catch {                 print ("Error")             }         }      }      override func viewDidLoad() {         super.viewDidLoad()         // Do any additional setup after loading the view, typically from a nib.          //LETS LOAD AN IMAGE FROM RESOURCE         let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too          //WE NEED A CIIMAGE - NOT NEEDED TO SCALE         inputImage = CIImage(image:loadedImage)!          //LET'S DO IT         self.doOCR(ciImage: inputImage!)       }      override func didReceiveMemoryWarning() {         super.didReceiveMemoryWarning()         // Dispose of any resources that can be recreated.     } }

You'll find the complete project here included is the trained model !

Converting a Vision VNTextObservation to a String

Tags:

ios

machine-learning

ocr

apple-vision

nslinguistictagger

Adrian

2 Answers

brian.clear

DrNeurosurg

Recent Activity

Donate For Us

Converting a Vision VNTextObservation to a String

Tags:

ios

machine-learning

ocr

apple-vision

nslinguistictagger

Adrian

2 Answers

brian.clear

DrNeurosurg

Related questions

Recent Activity

Donate For Us