How to detect text orientation in an image?
It doen't matter if the orientation is upside down (180 deg).. But if the text lines is vertical (90 or 270 deg) I need to rotate it 90 degrees.
I hope its possible without OCR because it takes too much resources to process OCR on 4 different orientations of the same image
The reason is that I use scantailor on images from a digital camera or smart phone and if the text orientation is 90 or 270 degree sometimes the image is cropped and text is lost
A technique I've used successfully is to use the Radon Transform. You can find an example of an implementation here in python. You can also use the projection that you get to detect the line spacing. The python implementation above also shows how to do that.
The intuitive explanation goes like this. For this we work with a grayscale image. Imagine you have a source of light, and some way of counting the number of light rays that touch a surface (a detector). Now imagine each character in a page acts as a wall that absorbs some of the light from passing through. Then if you shine light at an angle in the plane of the page, and put the detector on the other side, you can see that you will get the maximum light only when the light is shone in between the lines of the text. So the idea is to rotate that light source 180° around the page, and the angle where the detector captures the most light is the angle of your text. That's intuitively how a radon transform works.
For the technical explanation of the radon transform, please see wikipedia or other sources.
This technique allows you to detect very precisely the rotation of the text between 0° and 180° (it cannot detect if the text is upside down or not), depending on how many "increments" of the 180° rotation you try. Of course, more precision (increments) also increases the processing time. For your use case, since you already know that the text is at a 90° angle, you can just try two increments of 90°, which should be quite fast.
Then you need to use another technique to detect if it is upside down or not.
You can use the Hough Transform to detect the longest lines in your image and then find the predominant slope of those lines. If the slope is close to zero, your text is horizontal; if it's close to infinity, your text is vertical.
You don't mention if you're using a library to do this, but in OpenCV you could use HoughLinesP. I used this tutorial on an image found on wikimedia:
to get this image:
Then I rotated the original image:
to get this:
Since you're only interested in horizontal or vertical, you can just test whether the difference in the x-coordinates of the line endpoints is close to zero (vertical) or the difference in the y-coordinates is close to zero (horizontal).
The proposed solution (Hough transform) is good (and I upvoted it) but it might be CPU intensive. Here is a quick dirty solution:
Note: The described solution is a bit less accurate than Hough transform but it is very easy to implement, extremely fast (Entire processing is faster than just calculating derivatives of the image) + You will get for free the orientation of the text lines + partition of the document into lines & columns.
Good luck
Addition & Clarification to step 1: Explanation of step one. Suppose you have an image of width 'W' and Height 'H' and a black text on white background. By doing a horizontal projection you sum the values of pixels in each row. The result is a vector of length 'H'. Pixel Rows that don't include any parts of text (thus located between the text line) will yield a high projection values (because background is white - 255). Pixel rows that include parts of letters will yield a lower projection values. So now you have the vector of length H and you want to see if there is a clear partition of values inside it. A group of high values, than a group of low values, etc (like zebra stripes). Example: if you have 20 pixels distance between text lines and each letter has a height of 16 pixels you expect the projection vector to have 20 large values followed by 16 low numbers followed by 20 high values, 16 low, etc. Of course the document is not ideal, each letter has a different height, some have holes: (like 't' and 'q', 'i') but the general rule of partition holds. On the contrary if you rotate the document by 90 degrees and now your summation does not align with lines of text - the result vector will just have roughly random 'H' values without clear partition into groups. Now all you need to do is decide whether your result vector has a good partition or not. A quick way to do so is to calculate the standard deviation of the values. If there is a partition - the std will be high, otherwise it will be lower. Another way is to binarize your projection vector, treat it as a new image of size 1xH, lunch connected components analysis and extract the blobs. This is very fast because the blobs are one dimensional. So the bright blobs will mark roughly the areas between text lines and the dark holes mark the text lines. If your summation was good (vector had a clear partition) - you will have few large blobs (amount of blobs ~ roughly as amount of lines and median length of a blob ~ roughly as the distance between text lines). But if your summation was wrong (document rotated by 90 degrees) - you will get many random blobs. The connected component analysis requires a bit more code (compared to std) but it can give you the locations of the lines of texts. Line 'i' will be between blob 'i' and blob 'i+1'
Under Python, you would do the following, using pytesseract :
import re
import skimage
import pytesseract
img_path = '/home/name/Pictures/Screenshot from 2019-03-21 13-33-54 (copy).png'
im = skimage.io.imread(img_path)
newdata = pytesseract.image_to_osd(im, nice=1)
re.search('(?<=Rotate: )\d+', newdata).group(0)
Hope this still helps !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With