running multiple tesseract instances in parallel using multiprocessing not returning any results

Question

I'm writing a python script where I use multiproccesing library to launch multiple tesseract instances in parallel. when I use multiple calls to tesseract but in sequence using loop ,it works .However ,when I try to parallel code everything looks fine but I'm not getting any results (I waited for 10 minutes ).

In my code I try to Ocrize multiple pdf pages after I split them from the original multi page PDF.

Here's my code :

def processPage(i):



    nameJPG="converted-"+str(i)+".jpg"
    nameHocr="converted-"+str(i)
    p=subprocess.check_call(["tesseract",nameJPG,nameHocr,"-l","eng","hocr"])
    print "tesseract did the job for the ",str(i+1),"page" 

pool1=Pool(4)
    pool1.map(processPage, range(len(pdf.pages)))

vsnu · Accepted Answer

As what i know of pytesseract it will not allow multiple processes if you have quadcore and you are running 4 processes simultaneously than tesseract will be choked and you will have high cpu usage and other stuffs if you require this for company and you dont want to go with google vision api you have to set multiple servers and do socket programming to request text from different servers so that number of parallel process are less than ability of your server to run different processes at same time like for quad core it should be 2 or 3 or other wise you can hit google vision api they have lot of servers and there output is quite good too Disabling multiprocessing in tesseract will also help It can be done by setting OMP_THREAD_LIMIT=1 in the environment. but you must not run multiple process at same servers for tesseract

See https://github.com/tesseract-ocr/tesseract/issues/898#issuecomment-315202167

running multiple tesseract instances in parallel using multiprocessing not returning any results

Tags:

python

multiprocessing

tesseract

hamma

1 Answers

vsnu

Recent Activity

Donate For Us

running multiple tesseract instances in parallel using multiprocessing not returning any results

Tags:

python

multiprocessing

tesseract

hamma

1 Answers

vsnu

Related questions

Recent Activity

Donate For Us