Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Tesseract(an OCR engine) reentrant?

I am doing OCR using Tesseract on a quad-core processor. For better speed, I want to read 4 words at a time, using 4 threads. Is it safe to call Tesseract from multiple threads concurrently?

Note: each thread will be working on a different, non-shared image.

Note: guarding with locks is not ok because of speed.

like image 824
Hristo Hristov Avatar asked Jan 28 '11 11:01

Hristo Hristov


2 Answers

From the release notes, Tesseract is (mostly, and to the degree that you describe needing) thread-safe as of 3.01 (Oct 21 2011)

Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.

I've been successfully using it on multiple cores for that long (or longer, from dev branch).

like image 149
Kaolin Fire Avatar answered Sep 20 '22 06:09

Kaolin Fire


I don't think tesseract is currently parallelizable (see this thread), although one of the main goals for v3.0 is to make it more thread-safe.

However, you could always parallelize by running n concurrent processes of tesseract. If you want to parallelize the OCRing of a single image, it would be up to you to split it and feed each part to each of these n processes (basically a mapreduce).

like image 20
Mauricio Scheffer Avatar answered Sep 21 '22 06:09

Mauricio Scheffer