Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract 4 couldn't load any languages when used with OCR Engine mode - "Legacy + LSTM engines" (--oem 2)

I think this issue is only related to Tesseract 4 which comes with LSTM support. As I am using a 64-bit Windows System, I have downloaded 64-bit windows executable from here - https://github.com/UB-Mannheim/tesseract/wiki

It has the following OCR Engine modes:

  • 0 Legacy engine only.
  • 1 Neural nets LSTM engine only.
  • 2 Legacy + LSTM engines.
  • 3 Default, based on what is available.

It works with all the modes except 2.


When run with --oem 1

tesseract --oem 1 1.jpg 1

Result:

Tesseract Open Source OCR Engine v4.0.0.20190314 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 561
Detected 5 diacritics

and creates a file 1.txt with corresponding OCR result.


When run with --oem 2

tesseract --oem 2 1.jpg 1

Result:

Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

and no output is generated.


I thought the error will be with language installation but

tesseract --list-langs

which gave me the following result

List of available languages (2):
eng
osd

I even manually checked the tessdata folder, here is the screenshot of the same

enter image description here

which clearly states I already have eng language.

Can anyone help me with the exact problem that is disallowing me use Legacy + LSTM engines (--oem 2) mode.

like image 556
Shivam K. Thakkar Avatar asked Dec 13 '22 12:12

Shivam K. Thakkar


1 Answers

Yes, you have eng language, but with LSTM support only. If you want to have LSTM&Legacy support you need to download data from tessdata repository

like image 63
user898678 Avatar answered Dec 28 '22 08:12

user898678