Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can `tesseract-ocr` put the result to STDOUT?

Tags:

Using tesseract-ocr #3.02.02.

The basic usage of tesseract is

tesseract sourc.png result 

and result.txt is generated. To get the result text, I have to cat this file.

Is there any options to dump the result in stdout?

like image 386
otiai10 Avatar asked Jun 22 '14 03:06

otiai10


People also ask

What is the output of Tesseract?

Tesseract's standard output is a plain txt file (UTF-8 encoded, with ' as end-of-line marker) and 'FF as a form feed character after each page. With the configfile option set to pdf , tesseract will produce searchable PDF pages containing images with a hidden, searchable text layer.

Why is the Tesseract OCR not accurate?

Inevitably, noise in an input image, non-standard fonts that Tesseract wasn't trained on, or less than ideal image quality will cause Tesseract to make a mistake and incorrectly OCR a piece of text.

Is Tesseract good for OCR?

While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.

How does OCR work Tesseract?

Tesseract tests the text lines to determine whether they are fixed pitch. Where it finds fixed pitch text, Tesseract chops the words into characters using the pitch, and disables the chopper and associator on these words for the word recognition step.


2 Answers

The solution is:

tesseract input.jpg stdout 

But you need at least version 3.03

like image 196
Arnold Roa Avatar answered Oct 01 '22 15:10

Arnold Roa


You should upgrade to v3.03 where support for stdout was added.

like image 42
Simeon Visser Avatar answered Oct 01 '22 15:10

Simeon Visser