I'm building an OCR. For that I'm using <code>CNN</code>, <code>RNN</code> and <code>CTC</code> Loss Function. My input layer gets image and output layer predicts what's written on that image. Labels are converted into integer. <pre class="prettyprint"><code>['A', 'B', 'C'] -> A = 0, B = 1, C = 2 </code></pre> If the image is ABC, training label will be 0,1,2 (Single row vector) I'm able to accomplish this on single line. For eg. '<code>ABCDE</code>' is written on an image and model works great. But if the image is <pre class="prettyprint"><code>'ABC' 'CAB' </code></pre> then what should be the training label ? How can I tell the model about next line ? I want to train a model on multiple line.

You want to recognize text of a document containing multiple lines. There are two ways to achieve this: <ol> <li>Segment the document into lines as a pre-processing step, then feed each segmented line separately into your neural network. If you want to go this way, e.g. read the paper [1] from Bunke and Marti. They essentially count the black-white transitions for each scanline and create a histogram out of it. They use the minimums of the histogram to split the document into individual lines. There are some other methods too to segment a document into lines.</li> <li>Train the neural network to implicitly segment the document into lines. You need to add attention to the neural network, such that it can focus on individual lines. Bluche has done some great work towards text recognition on document-level. See the paper [2] and the website [3].</li> </ol> [1] Bunke, Marti: The IAM-database: an English sentence database for offline handwriting recognition. Download via Springer [2] Bluche: Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition. Download via https://arxiv.org/abs/1604.08352 [3] Bluche: Scan, Attend and Read. See http://www.tbluche.com/scan_attend_read.html and look for "Handwriting Recognition with MDLSTM and CTC" and "The Collapse Layer and its Proposed Replacements"

Optical Character Recognition Multiple Line Detection

Tags:

python

tensorflow

ocr

keras

I'm building an OCR. For that I'm using CNN, RNN and CTC Loss Function. My input layer gets image and output layer predicts what's written on that image. Labels are converted into integer.

['A', 'B', 'C'] -> A = 0, B = 1, C = 2

If the image is ABC, training label will be 0,1,2 (Single row vector)

I'm able to accomplish this on single line. For eg. 'ABCDE' is written on an image and model works great. But if the image is

'ABC'

'CAB'

then what should be the training label ? How can I tell the model about next line ? I want to train a model on multiple line.

389

asked Dec 26 '18 07:12

Kartikeya Rana

1 Answers

You want to recognize text of a document containing multiple lines. There are two ways to achieve this:

Segment the document into lines as a pre-processing step, then feed each segmented line separately into your neural network. If you want to go this way, e.g. read the paper [1] from Bunke and Marti. They essentially count the black-white transitions for each scanline and create a histogram out of it. They use the minimums of the histogram to split the document into individual lines. There are some other methods too to segment a document into lines.
Train the neural network to implicitly segment the document into lines. You need to add attention to the neural network, such that it can focus on individual lines. Bluche has done some great work towards text recognition on document-level. See the paper [2] and the website [3].

[1] Bunke, Marti: The IAM-database: an English sentence database for offline handwriting recognition. Download via Springer

[2] Bluche: Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition. Download via https://arxiv.org/abs/1604.08352

[3] Bluche: Scan, Attend and Read. See http://www.tbluche.com/scan_attend_read.html and look for "Handwriting Recognition with MDLSTM and CTC" and "The Collapse Layer and its Proposed Replacements"

answered Oct 22 '22 02:10

Harry

Related questions
                            
                                How to combine the phase of one image and magnitude of different image into 1 image by using python
                            
                                How to create a type that is closed under inherited operations?
                            
                                Keras floods Jupyter cell output during fit (verbose=1)
                            
                                How to create an abstract subclass of a concrete superclass in Python 3?
                            
                                Multiprocessing slower than serial processing in Windows (but not in Linux)
                            
                                Use __init__.py to modify sys path is a good idea?
                            
                                Permission Check Discord.py Bot
                            
                                can't understand [Errno 111] Connection refused
                            
                                Why doesn't tkinter release memory when an instance is destroyed?
                            
                                Multiprocessing large XML file with shared memory complex objects
                            
                                Tkinter button expand using grid
                            
                                How do I create a seaborn line plot for PySpark dataframe?
                            
                                OpenSSL: error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
                            
                                AttributeError when training CNN 1D with Python Keras
                            
                                Python fuzzy string matching as correlation style table/matrix
                            
                                How do I downsample a 1d numpy array?
                            
                                Fill missing value by averaging previous row value
                            
                                how to create custom operators in airflow and use them in airflow template which is running through cloud composer(in google cloud platform)
                            
                                What is the equivalent way of doing this type of pythonic vectorized assignment in MATLAB?
                            
                                Structure of package that can also be run as command line script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With