Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to increase resolution of text in scanned images in python?

I use tesseract-OCR to extract text from scanned images, For few images text is not properly recognized due to low resolution and output produced is some irrelevant characters.

Techniques applied:

  1. Increase the dpi to 300.

  2. Image pre- processing techniques in opencv.

  3. Upscaling of images using dnn_superres in opencv

  4. Noise removal techniques.

  5. Refereed git repos where super-resolution algorithm model is developed using Deep learning.

  6. Improve tesseract-ocr quality by training tessdata.

Reference Links:

  1. Improve OCR accuracy from scanned documents
  2. image processing to improve tesseract OCR accuracy

Sample Image:

enter image description here

Is there any simple way in python to improve the text without using any Deep learning model.

like image 598
Jennifer Avatar asked May 08 '20 09:05

Jennifer


People also ask

How to extract text from an image using Python?

To extract text from the image we can use the PIL and pytesseract libraries. We currently perform this step for a single image, but this can be easily modified to loop over a set of images. We can enhance the accuracy of the output by fine tuning the parameters but the objective is to show text extraction.

How to increase the resolution of a photo in Photoshop?

1. Open your image in Lightroom. 2. Choose Photo > Enhance. 3. Select Super Resolution. 4. Click Enhance. Lightroom will increase your image resolution and save it as a new DNG file. Any previous edits you’ve made to your new high-resolution photo will be included. Adjusting resolution in Photoshop.

Can you increase the resolution of a digital image file?

You can increase the resolution of a digital image file, but you will lose image quality by doing so. However, there are some measures you can take in Adobe Photoshop to help you increase resolution while upholding visual quality.

What happens when you increase the number of pixels in image?

The more pixels an image starts with, the higher the resolution. Decreasing the number of pixels is called downsampling, which removes data from your image. Increasing the number of pixels is called upsampling, which adds data to the image.


1 Answers

I am aware you would prefer to upscale these input images with using deep learning, but I would highly recommend experimenting with https://github.com/alexjc/neural-enhance, assuming you have the appropriate hardware to run the neural networks and deep learning.

The results for your OCR input images could be promising. The documentation for the code is quite substantial.

Hope this helps you!

like image 150
Matthew Smith Avatar answered Oct 17 '22 11:10

Matthew Smith