I just installed Tesseract OCR and after running the command $ tesseract --list-langs
the output showed only 2 languages, eng
and osd
. My question is, how do I load another language, in my case specifically, Japanese?
The only language pack installed in macOS Tesseract is English, which is contained in the eng. traineddata file.
Since tesseract 3.02 it is possible to specify multiple languages for the -l parameter. -l lang The language to use. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters.
Unfortunately tesseract does not have a feature to detect language of the text in an image automatically. An alternative solution is provided by another python module called langdetect which can be installed via pip.
The Tesseract OCR engine supports multiple languages. To detect characters from a specific language, the language needs to be specified while creating the OCR engine itself. English, German, Spanish, French and Italian languages come embedded with the action so they do not require additional parameters.
I learned that by grabbing the trained data from https://github.com/tesseract-ocr/tessdata and placing it in the same directory as the other trained data, i.e., eng.traineddata
and by passing the language flag -l LANG
tesseract should be able to read the language you've specified, in the following example, Japanese: tesseract -l jpn sample-jpn.png output-jpn
.
This works for me:
sudo apt-get install tesseract-ocr-jpn
hope this will help.
1. pip install pytesseract
2. for windows install tesseract-ocr from
https://digi.bib.uni-mannheim.de/tesseract
select all language options while installing
3. set the tesseract-ocr path under anaconda/lib/site-packages/pytesseract/pytesseract.py
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
4. from pytesseract import image_to_string
print(image_to_string(test_file, 'jpn')) #for Japenese text extraction
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With