Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I run tesseract with multiple languages one time?

Tags:

I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if I run tesseract with japanese (-l jpn) some English characters lost (e.g. Email).

How can I run one process which recognize both English and Japanese characters?

like image 954
pars Avatar asked Jun 24 '14 06:06

pars


People also ask

How many languages does Tesseract support?

In fact, Tesseract supports over 100 languages, including those that comprise characters and symbols, as well as right-to-left languages.

Does Tesseract support other languages?

The Tesseract OCR engine supports multiple languages. To detect characters from a specific language, the language needs to be specified while creating the OCR engine itself. English, German, Spanish, French and Italian languages come embedded with the action so they do not require additional parameters.

How do you specify a language in Tesseract?

Since tesseract 3.02 it is possible to specify multiple languages for the -l parameter. -l lang The language to use. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters.

Is Tesseract multithreaded?

Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.


2 Answers

Since tesseract 3.02 it is possible to specify multiple languages for the -l parameter.

-l lang The language to use. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters. Tesseract uses 3-character ISO 639-2 language codes.

An example:

tesseract myscan.png out -l deu+eng 
like image 144
tobltobs Avatar answered Oct 24 '22 10:10

tobltobs


Try this:

custom_config = r'-l eng+jpn --psm 6' txt = pytesseract.image_to_string(img, config=custom_config)  from langdetect import detect_langs detect_langs(txt) 

Note: you have to install langdetect by using:

 pip install langdetect 
like image 35
rahul Avatar answered Oct 24 '22 09:10

rahul