Align text for OCR

Tags:

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR each page. Since the data in these books does not come in well formatted tables, I need to segment each page into rows and columns and then OCR each piece separately.

One of the critical steps is to align the text in the image.

For example, this is a typical page that needs to be aligned: page to align

A solution I found is to smudge the text horizontally (I'm using skimage.ndimage.morphology.binary_dilation) and find the rotation that maximizes the sum of white pixels along the horizontal dimension.

This works fine, but it takes about 8 seconds per page, which given the volume of pages I am working with, is way too much.

Do you know of a better, faster way of accomplishing aligning the text?

Update:

I use scikit-image for image processing functions, and scipy to maximize the count of white pixels along the horizontal axis.

Here is a link to an html view of the Jupyter notebook I used to work on this. The code uses some functions from a module I've written for this project so it cannot be run on its own.

Link to notebook (dropbox): https://db.tt/Mls9Tk8s

Update 2:

Here is a link to the original raw image (dropbox): https://db.tt/1t9kAt0z

619

asked Nov 13 '15 17:11

Maturin

2 Answers

Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.

You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:

enter image description here

I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:

enter image description here

and simply take the columnwise mean:

enter image description here

The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:

enter image description here

Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.

Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)

Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.

191

answered Sep 23 '22 16:09

Niki

This is not a full solution but there is more than a comment's worth of thoughts.

You have a margin on the left and right and top and bottom of your image. If you remove that, and even cut into the text in the process, you will still have enough information to align the image. So, if you chop, say 15%, off the top, bottom, left and right, you will have reduced your image area by 50% already - which will speed things up down the line.

Now take your remaining central area, and divide that into, say 10 strips all of the same height but the full width of the page. Now calculate the mean brightness of those strips and take the 1-4 darkest as they contain the most (black) lettering. Now work on each of those in parallel, or just the darkest. You are now processing just the most interesting 5-20% of the page.

Here is the command to do that in ImageMagick - it's just my weapon of choice and you can do it just as well in Python.

convert scan.jpg -crop 300x433+64+92 -crop x10@ -format "%[fx:mean]\n" info:

0.899779
0.894842
0.967889
0.919405
0.912941
0.89933
0.883133    <--- choose 4th last because it is darkest
0.889992
0.88894
0.888865

If I make separate images out of those 10 stripes, I get this

convert scan.jpg -crop 300x433+64+92 -crop x10@ m-.jpg

enter image description here

and effectively, I do the alignment on the fourth last image rather than the whole image.

Maybe unscientific, but quite effective and pretty easy to try out.

Another thought, once you have your procedure/script sorted out for straightening a single image, do not forget you can often get massive speedup by using GNU Parallel to harass all your CPU's lovely, expensive cores simultaneously. Here I specify 8 processes to run in parallel...

#!/bin/bash
for ((i=0;i<100000;i++)); do 
   ProcessPage $i
done | parallel --eta -j 8

answered Sep 24 '22 16:09

Mark Setchell

Related questions
                            
                                How can I keep field values in a form after submit?
                            
                                Escape single quote (') in raw string r'...'
                            
                                django values_list of all fields in foreign key
                            
                                Handling Exceptions in Python Behave Testing framework
                            
                                Cannot import name simplejson - After installing simplejson
                            
                                Can I make ipython exit from the calling code?
                            
                                How to purge tasks in celery queues using Redis as the broker
                            
                                How to use different marker for different point in scatter plot pylab
                            
                                Decorate a function after it is defined?
                            
                                Creating a shell command line application with Python and Click
                            
                                Converting a datetime object to an integer python
                            
                                GPU Accelerated data plotting in Python
                            
                                How to split string without spaces into list of integers in Python? [duplicate]
                            
                                Flask only sees first parameter from multiple parameters sent with curl
                            
                                PyQt4 - creating a timer
                            
                                count number of black pixels in an image in Python with OpenCV
                            
                                eigenvectors created by numpy.linalg.eig don't seem correct
                            
                                Pyspark changing type of column from date to string
                            
                                xlwings function to find the last row with data
                            
                                Symbol not found: _BIO_new_CMS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Align text for OCR

Tags:

python

image-processing

ocr

Update:

Update 2:

Maturin

People also ask

2 Answers

Niki

Mark Setchell

Recent Activity

Donate For Us