Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is tesseract 3.00 multi-threaded?

I read some other posts suggesting that they would add multi-threading support in 3.00. But I'm not sure if it's added in 3.00 when it was released.

Other than multi-threading, is running multiple processes of tesseract a feasible option to achieve concurrency?

Thanks.

like image 566
pshah Avatar asked Feb 10 '11 21:02

pshah


1 Answers

One thing I've done is invoked GNU Parallel to run as many instances of Tess* as able on a multi-core system for multi-page documents converted to single page images.

It's a short program, easily compiled on most Linux distros (I'm using OpenSuSE 11.4).

Here's the command line that I use:

/usr/local/bin/parallel -j 4 \
   /usr/local/bin/tesseract -psm 1 -l eng {} {.} \
   ::: /tmp/tmp/*.jpg

The -j 4 tells parallel to use all four CPU cores that I have on a server.

If you run this, and in another terminal do a 'top,' you'll see up to four processes at one time until it rummages through all of the JPG's in the directory specified.

Your load should never exceed the number of CPU cores in your system (if you run Linux).

Here's the link to GNU Parallel:

http://www.gnu.org/software/parallel/

like image 92
Armando Ortiz Avatar answered Oct 24 '22 00:10

Armando Ortiz