Using tesseract-ocr
#3.02.02.
The basic usage of tesseract is
tesseract sourc.png result
and result.txt
is generated. To get the result text, I have to cat this file.
Is there any options to dump the result in stdout?
Tesseract's standard output is a plain txt file (UTF-8 encoded, with ' as end-of-line marker) and 'FF as a form feed character after each page. With the configfile option set to pdf , tesseract will produce searchable PDF pages containing images with a hidden, searchable text layer.
Inevitably, noise in an input image, non-standard fonts that Tesseract wasn't trained on, or less than ideal image quality will cause Tesseract to make a mistake and incorrectly OCR a piece of text.
While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.
Tesseract tests the text lines to determine whether they are fixed pitch. Where it finds fixed pitch text, Tesseract chops the words into characters using the pitch, and disables the chopper and associator on these words for the word recognition step.
The solution is:
tesseract input.jpg stdout
But you need at least version 3.03
You should upgrade to v3.03 where support for stdout was added.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With