Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect text orientation

How to detect text orientation in an image?

It doen't matter if the orientation is upside down (180 deg).. But if the text lines is vertical (90 or 270 deg) I need to rotate it 90 degrees.

I hope its possible without OCR because it takes too much resources to process OCR on 4 different orientations of the same image

The reason is that I use scantailor on images from a digital camera or smart phone and if the text orientation is 90 or 270 degree sometimes the image is cropped and text is lost

like image 808
clarkk Avatar asked May 21 '14 12:05

clarkk


4 Answers

A technique I've used successfully is to use the Radon Transform. You can find an example of an implementation here in python. You can also use the projection that you get to detect the line spacing. The python implementation above also shows how to do that.

The intuitive explanation goes like this. For this we work with a grayscale image. Imagine you have a source of light, and some way of counting the number of light rays that touch a surface (a detector). Now imagine each character in a page acts as a wall that absorbs some of the light from passing through. Then if you shine light at an angle in the plane of the page, and put the detector on the other side, you can see that you will get the maximum light only when the light is shone in between the lines of the text. So the idea is to rotate that light source 180° around the page, and the angle where the detector captures the most light is the angle of your text. That's intuitively how a radon transform works.

enter image description here For the technical explanation of the radon transform, please see wikipedia or other sources.

This technique allows you to detect very precisely the rotation of the text between 0° and 180° (it cannot detect if the text is upside down or not), depending on how many "increments" of the 180° rotation you try. Of course, more precision (increments) also increases the processing time. For your use case, since you already know that the text is at a 90° angle, you can just try two increments of 90°, which should be quite fast.

Then you need to use another technique to detect if it is upside down or not.

like image 199
mananony Avatar answered Nov 11 '22 20:11

mananony


You can use the Hough Transform to detect the longest lines in your image and then find the predominant slope of those lines. If the slope is close to zero, your text is horizontal; if it's close to infinity, your text is vertical.

You don't mention if you're using a library to do this, but in OpenCV you could use HoughLinesP. I used this tutorial on an image found on wikimedia:

horizontal text

to get this image:

horizontal output

Then I rotated the original image:

vertical text

to get this:

enter image description here

Since you're only interested in horizontal or vertical, you can just test whether the difference in the x-coordinates of the line endpoints is close to zero (vertical) or the difference in the y-coordinates is close to zero (horizontal).

like image 34
beaker Avatar answered Nov 11 '22 18:11

beaker


The proposed solution (Hough transform) is good (and I upvoted it) but it might be CPU intensive. Here is a quick dirty solution:

  1. Just calculate a horizontal projection (sum the brightness of the pixels in each pixel row). It should clearly mark the positions of the text lines (bonus: you get a partition of the text to lines). Do otsu binarization to clearly see the partition.
  2. Rotate the image by 90 degrees and repeat step 1. If now the text line are perpendicular to the pixel rows the result of the projection should just be a blurry mess (no clear partition of text lines (Bonus: This partition will mark the borders of the page and if the text is arranged in columns, you will get the structure of the columns).
  3. Now You just need to decide which projection (step 1, or step 2) represents real text lines. You can calculate the amount ob blobs (one dimensional blobs - so the processing is extremely fast) and choose the one with more blobs (there are more lines than text columns). Alternatively you can just calculate standard deviation of each projection vector and take the one with the higher 'std'. This is even much faster.
  4. All the above holds if the text goes clearly in 0 degrees or 90 degrees. If it is rotated, say to 10 degrees than both projections will return a mess. In that case you can cut your document to say 5x5 pieces (25 pieces), perform steps 1,2,3 on each piece and choose the decision according to the majority.

Note: The described solution is a bit less accurate than Hough transform but it is very easy to implement, extremely fast (Entire processing is faster than just calculating derivatives of the image) + You will get for free the orientation of the text lines + partition of the document into lines & columns.

Good luck

Addition & Clarification to step 1: Explanation of step one. Suppose you have an image of width 'W' and Height 'H' and a black text on white background. By doing a horizontal projection you sum the values of pixels in each row. The result is a vector of length 'H'. Pixel Rows that don't include any parts of text (thus located between the text line) will yield a high projection values (because background is white - 255). Pixel rows that include parts of letters will yield a lower projection values. So now you have the vector of length H and you want to see if there is a clear partition of values inside it. A group of high values, than a group of low values, etc (like zebra stripes). Example: if you have 20 pixels distance between text lines and each letter has a height of 16 pixels you expect the projection vector to have 20 large values followed by 16 low numbers followed by 20 high values, 16 low, etc. Of course the document is not ideal, each letter has a different height, some have holes: (like 't' and 'q', 'i') but the general rule of partition holds. On the contrary if you rotate the document by 90 degrees and now your summation does not align with lines of text - the result vector will just have roughly random 'H' values without clear partition into groups. Now all you need to do is decide whether your result vector has a good partition or not. A quick way to do so is to calculate the standard deviation of the values. If there is a partition - the std will be high, otherwise it will be lower. Another way is to binarize your projection vector, treat it as a new image of size 1xH, lunch connected components analysis and extract the blobs. This is very fast because the blobs are one dimensional. So the bright blobs will mark roughly the areas between text lines and the dark holes mark the text lines. If your summation was good (vector had a clear partition) - you will have few large blobs (amount of blobs ~ roughly as amount of lines and median length of a blob ~ roughly as the distance between text lines). But if your summation was wrong (document rotated by 90 degrees) - you will get many random blobs. The connected component analysis requires a bit more code (compared to std) but it can give you the locations of the lines of texts. Line 'i' will be between blob 'i' and blob 'i+1'

like image 12
DanielHsH Avatar answered Nov 11 '22 19:11

DanielHsH


Under Python, you would do the following, using pytesseract :

import re
import skimage
import pytesseract
img_path = '/home/name/Pictures/Screenshot from 2019-03-21 13-33-54 (copy).png'
im = skimage.io.imread(img_path)
newdata = pytesseract.image_to_osd(im, nice=1)
re.search('(?<=Rotate: )\d+', newdata).group(0)

Hope this still helps !

like image 3
Tropiquo Avatar answered Nov 11 '22 18:11

Tropiquo