I'm building an OCR. For that I'm using CNN
, RNN
and CTC
Loss Function.
My input layer gets image and output layer predicts what's written on that image. Labels are converted into integer.
['A', 'B', 'C'] -> A = 0, B = 1, C = 2
If the image is ABC, training label will be 0,1,2 (Single row vector)
I'm able to accomplish this on single line. For eg. 'ABCDE
' is written on an image and model works great. But if the image is
'ABC'
'CAB'
then what should be the training label ? How can I tell the model about next line ? I want to train a model on multiple line.
Optical character recognition or OCR refers to a set of computer vision problems that require us to convert images of digital or hand-written text images to machine readable text in a form your computer can process, store and edit as a text file or as a part of a data entry and manipulation software.
This reference app demos how to use TensorFlow Lite to do OCR. It uses a combination of text detection model and a text recognition model as an OCR pipeline to recognize text characters.
How OCR algorithms work. Optical character recognition works by dividing up the image of a text character into sections and distinguishing between empty and non-empty regions.
Optical Character Recognition (OCR) is the process that converts an image of text into a machine-readable text format. For example, if you scan a form or a receipt, your computer saves the scan as an image file.
You want to recognize text of a document containing multiple lines. There are two ways to achieve this:
Segment the document into lines as a pre-processing step, then feed each segmented line separately into your neural network. If you want to go this way, e.g. read the paper [1] from Bunke and Marti. They essentially count the black-white transitions for each scanline and create a histogram out of it. They use the minimums of the histogram to split the document into individual lines. There are some other methods too to segment a document into lines.
Train the neural network to implicitly segment the document into lines. You need to add attention to the neural network, such that it can focus on individual lines. Bluche has done some great work towards text recognition on document-level. See the paper [2] and the website [3].
[1] Bunke, Marti: The IAM-database: an English sentence database for offline handwriting recognition. Download via Springer
[2] Bluche: Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition. Download via https://arxiv.org/abs/1604.08352
[3] Bluche: Scan, Attend and Read. See http://www.tbluche.com/scan_attend_read.html and look for "Handwriting Recognition with MDLSTM and CTC" and "The Collapse Layer and its Proposed Replacements"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With