I just installed Tesseract OCR and after running the command <code>$ tesseract --list-langs</code> the output showed only 2 languages, <code>eng</code> and <code>osd</code>. My question is, how do I load another language, in my case specifically, Japanese?

I learned that by grabbing the trained data from https://github.com/tesseract-ocr/tessdata and placing it in the same directory as the other trained data, i.e., <code>eng.traineddata</code> and by passing the language flag <code>-l LANG</code> tesseract should be able to read the language you've specified, in the following example, Japanese: <code>tesseract -l jpn sample-jpn.png output-jpn</code>.

This works for me: <pre class="prettyprint"><code>sudo apt-get install tesseract-ocr-jpn </code></pre> hope this will help.

Tesseract OCR loading a language - Japanese

Tags:

tesseract

I just installed Tesseract OCR and after running the command $ tesseract --list-langs the output showed only 2 languages, eng and osd. My question is, how do I load another language, in my case specifically, Japanese?

924

asked Aug 16 '17 15:08

Freddy

3 Answers

I learned that by grabbing the trained data from https://github.com/tesseract-ocr/tessdata and placing it in the same directory as the other trained data, i.e., eng.traineddata and by passing the language flag -l LANG tesseract should be able to read the language you've specified, in the following example, Japanese: tesseract -l jpn sample-jpn.png output-jpn.

191

answered Jan 01 '23 10:01

Freddy

This works for me:

sudo apt-get install tesseract-ocr-jpn

hope this will help.

answered Jan 01 '23 09:01

Harald

1. pip install pytesseract

2. for windows install tesseract-ocr from 
https://digi.bib.uni-mannheim.de/tesseract
select all language options while installing

3. set the tesseract-ocr path under anaconda/lib/site-packages/pytesseract/pytesseract.py

tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'

4. from pytesseract import image_to_string
print(image_to_string(test_file, 'jpn')) #for Japenese text extraction

answered Jan 01 '23 08:01

Amir

Related questions
                            
                                Can tesseract be trained for non-font symbols?
                            
                                How to separate title and headers from body text in image
                            
                                Tesseract OCR only detect user-words
                            
                                How to set tessedit_write_images in python-tesseract?
                            
                                Image preprocessing for egg marking recognition with Tesseract
                            
                                Creating a training image for Tesseract OCR
                            
                                Python OCR : Converting Scanned Image Into Text For Processing
                            
                                Trouble recognizing digits in Tesseract - android
                            
                                c#-tesseract get space recoginition in digits
                            
                                Tesseract False Space Recognition
                            
                                How can I make tesseract on iOS faster?
                            
                                Python Image Processing on Captcha how to remove noise

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tesseract OCR loading a language - Japanese

Tags:

tesseract

Freddy

People also ask

3 Answers

Freddy

Harald

Amir

Recent Activity

Donate For Us