Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting text out of images

I am working on extracting text out of images.

Initially images are colored with text placed in white, On further processing the images, the text is shown in black and other pixels are white (with some noise), here is a sample:

Now when I try OCR using pytesseract (tesseract) on it, I still am not getting any text.

Is any solution possible to extract text from colored images?

like image 484
Yash Arora Avatar asked Sep 17 '17 05:09

Yash Arora


People also ask

How do I extract text from an image?

The text extractor will allow you to extract text from any image. You may upload an image or document (.doc,.pdf) and the tool will pull text from the image. Once extracted, you can copy to your clipboard with one click.

How do I extract text from an image in Workbench?

The text extractor will allow you to extract text from any image. You may upload an image or document (.pdf) and the tool will pull text from the image. Once extracted, you can copy to your clipboard with one click. Explore other Workbench solutions File Converter Tool

How do I extract text from a scanned PDF file?

How can I extract text from a scanned PDF? You can capture text from a scanned image, upload your image file from your computer, or take a screenshot on your desktop. Then simply right click on the image, and select Grab Text. The text from your scanned PDF can then be copied and pasted into other programs and applications.

How do I extract captions from an image?

Upload your image or drag & drop it. Or enter the URL if you have a link to the image. Hit the Submit button. Copy the text to the clipboard or save it as a document. Capturing text from images is totally free. You don’t have to spend a single penny to extract captions from your favorite photos.


1 Answers

from PIL import Image
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image and convert it to grayscale
image = cv2.imread(args["image"])
cv2.imshow("Original", image)

# Apply an "average" blur to the image

blurred = cv2.blur(image, (3,3))
cv2.imshow("Blurred_image", blurred)
img = Image.fromarray(blurred)
text = pytesseract.image_to_string(img, lang='eng')
print (text)
cv2.waitKey(0)

As as result i get = "Stay: in an Overwoter Bungalow $3»"

What about using Contour and taking unnecessary blobs from it ? might work

like image 80
Deepan Raj Avatar answered Oct 17 '22 12:10

Deepan Raj