I'm using tesseract on a project and want to know the best image input type for tesseract to give the best output. Is Binary&TIFF the best input or there's something else?
We always recommend feeding the OCR engine images saved with the following specifications: 1- High resolution (300 DPI is good). 2- Saved as 1-bit (black and white) mode. 3- Saved in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.
File Input FormatsTesseract will only take image files for input. These include: TIFF (preferred) JPG.
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF.
I had excellent results using TIFF in the past for a similar task. At the time I did some pre-processing using OpenCV and exported the result to a TIFF file that later was sent to tesseract. It was pretty good.
I've found TIFF to give far superior results to jpg, as well as being the best against all other types.
The original Tesseract programme would only work with TIFF files, leading me to believe it would be the most appropriate
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With