Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what's the best image input type for tesseract?

I'm using tesseract on a project and want to know the best image input type for tesseract to give the best output. Is Binary&TIFF the best input or there's something else?

like image 836
chostDevil Avatar asked Apr 17 '12 14:04

chostDevil


People also ask

What is the best image format for OCR?

We always recommend feeding the OCR engine images saved with the following specifications: 1- High resolution (300 DPI is good). 2- Saved as 1-bit (black and white) mode. 3- Saved in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.

Does Tesseract support JPG?

File Input FormatsTesseract will only take image files for input. These include: TIFF (preferred) JPG.

Does Tesseract support PNG?

Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF.


2 Answers

I had excellent results using TIFF in the past for a similar task. At the time I did some pre-processing using OpenCV and exported the result to a TIFF file that later was sent to tesseract. It was pretty good.

like image 96
karlphillip Avatar answered Sep 28 '22 05:09

karlphillip


I've found TIFF to give far superior results to jpg, as well as being the best against all other types.

The original Tesseract programme would only work with TIFF files, leading me to believe it would be the most appropriate

like image 36
Contangwardation Avatar answered Sep 28 '22 05:09

Contangwardation