Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing image DPI for usage with tesseract

I am working on a project to recognize text in Business Cards and map them to appropriate fields.I am using opencv for image processing.I need to feed the preprocessed image to Tesseract-OCR engine for text recognition.This link states that images should have atleast a DPI of 300.My image pixel size is 2560x1536 with 72 DPI.

  • How to increase the DPI to 300?
  • It is also said that it is beneficial to resize image.How to resize my image optimally for good OCR results
  • Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images. What does 'so' imply here.What is the relation between resizing an image and DPI?
like image 594
SriChakra Avatar asked May 21 '17 10:05

SriChakra


Video Answer


1 Answers

For OCR, what really matters is the resolution in pixels. Because the physical characters can range from tiny to huge, independently of the DPI of the acquisition device.

As a rule of thumb, stroke width around 3 pixels is a good start. If lower, resizing might not be helpful because the information is missing. If much higher, the running time might be excessive (or the OCR function not be taylored to deal with it).

Also check that the package will not attempt to resize internally, based on its own assumption of stroke width and the DPI info stored in the header, if there is a mismatch.

like image 154
Yves Daoust Avatar answered Oct 25 '22 19:10

Yves Daoust