Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect if a text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:

import numpy as np import cv2  from skimage.transform import radon   filename = 'path_to_filename' # Load file, converting to grayscale img = cv2.imread(filename) I = cv2.cvtColor(img, COLOR_BGR2GRAY) h, w = I.shape # If the resolution is high, resize the image to reduce processing time. if (w > 640):     I = cv2.resize(I, (640, int((h / w) * 640))) I = I - np.mean(I)  # Demean; make the brightness extend above and below zero # Do the radon transform sinogram = radon(I) # Find the RMS value of each row and find "busiest" rotation, # where the transform is lined up perfectly with the alternating dark # text and white lines r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()]) rotation = np.argmax(r) print('Rotation: {:.2f} degrees'.format(90 - rotation))  # Rotate and save with the original resolution M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1) dst = cv2.warpAffine(img,M,(w,h)) cv2.imwrite('rotated.jpg', dst) 

This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.

Here is an example:
enter image description here

The resulted image that I get is the same as the input image.

Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.


EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:

import cv2 import pytesseract   print(pytesseract.image_to_osd(cv2.imread(file_name))) 

If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.

like image 410
singrium Avatar asked Apr 12 '19 14:04

singrium


People also ask

Can OCR read text upside down?

OCR engines are intelligent, but like humans, they are not trained to read upside-down! Therefore, a critical first step in preparing your image data for OCR is to detect text orientation (if any) and then correct the text orientation.

How do you check if image is upside down python?

It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters. If the document contains enough characters, it is possible for Tesseract to detect the orientation.

How do you detect if text is rotated 180 degrees or flipped upside down?

By comparing the values between the two halves, if the top half has more pixels than the bottom half, it is upside down by 180 degrees. If it has less, it is correctly oriented.

Why is my image upside down?

Images might display sideways or upside down after uploading them to your website thanks to the picture being taken on a phone or camera that is in landscape mode. While most image viewers will automatically rotate the image to the correct orientation when viewing it, most internet browsers do not.


2 Answers

Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:

  1. Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
  2. Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
  3. For each peak, check if the stronger sub-peak is on top or at the bottom.
  4. Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.

The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.

Here's a figure to illustrate the approach; A few lines of you example image

  • Blue: Original projection
  • Orange: smoothed projection
  • Horizontal line: average of the smoothed projection for the whole image.

bla

here's some code that does this:

import cv2 import numpy as np  # load image, convert to grayscale, threshold it at 127 and invert. page = cv2.imread('Page.jpg') page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY) page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]  # project the page to the side and smooth it with a gaussian projection = np.sum(page, 1) gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2)) gaussian_filter /= np.sum(gaussian_filter) smooth = np.convolve(projection, gaussian_filter)  # find the pixel values where we expect lines to start and end mask = smooth > np.average(smooth) edges = np.convolve(mask, [1, -1]) line_starts = np.where(edges == 1)[0] line_endings = np.where(edges == -1)[0]  # count lines with peaks on the lower side lower_peaks = 0 for start, end in zip(line_starts, line_endings):     line = smooth[start:end]     if np.argmax(line) < len(line)/2:         lower_peaks += 1  print(lower_peaks / len(line_starts)) 

this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.

Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.

Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.

like image 20
Obay Avatar answered Sep 19 '22 03:09

Obay


Python3/OpenCV4 script to align scanned documents.

Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:

rotate to find maximum zebra

Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.

# Rotate the image around in a circle angle = 0 while angle <= 360:     # Rotate the source image     img = rotate(src, angle)         # Crop the center 1/3rd of the image (roi is filled with text)     h,w = img.shape     buffer = min(h, w) - int(min(h,w)/1.15)     roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]     # Create background to draw transform on     bg = np.zeros((buffer*2, buffer*2), np.uint8)     # Compute the sums of the rows     row_sums = sum_rows(roi)     # High score --> Zebra stripes     score = np.count_nonzero(row_sums)     scores.append(score)     # Image has best rotation     if score <= min(scores):         # Save the rotatied image         print('found optimal rotation')         best_rotation = img.copy()     k = display_data(roi, row_sums, buffer)     if k == 27: break     # Increment angle and try again     angle += .75 cv2.destroyAllWindows() 

best rotation

How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:

rightside upupside down

# Find the area from the top of page to top of image _, bg = area_to_top_of_text(best_rotation.copy()) right_side_up = sum(sum(bg)) # Flip image and try again best_rotation_flipped = rotate(best_rotation, 180) _, bg = area_to_top_of_text(best_rotation_flipped.copy()) upside_down = sum(sum(bg)) # Check which area is larger if right_side_up < upside_down: aligned_image = best_rotation else: aligned_image = best_rotation_flipped # Save aligned image cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image) cv2.destroyAllWindows() 
like image 56
Stephen Meschke Avatar answered Sep 21 '22 03:09

Stephen Meschke