Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract - change language file location

I am making an AIR project, which will need some OCR capabilities, so i decided to use tesseract (now i try to get it working on Windows).

My problem is, that can not change the location of the language file - it always tries to look in my Tesseract installation directory (program files (x86)\Tesseract-OCR\tessdata\mylang.traineddata)

Is there a way that i can configure Tesseract to look for this file where i specify? for example in the same folder as tesseract.exe. I dont want (or perhaps event cant) install an application with the AIR installer. I've tried it with the 3.0 version and the latest SVN version.

Thanks

like image 385
sydd Avatar asked Aug 05 '11 03:08

sydd


People also ask

How do I change the language on Tesseract?

We can do this by supplying the --lang or -l command line argument, specifying the language we want Tesseract to use when OCR'ing. Here, I am OCR'ing a file named german. png where the -l parameter indicates that I want Tesseract to OCR German text ( deu ).

Where is the Tesseract directory?

To install additional languages into Islandora, you will need to know the path to your Tesseract installation's 'tessdata' folder. On Windows, this will tend to be C:\Program Files (x86)\Tesseract OCR\tessdata, if you've used the Tesseract website's own installation case.

Does Tesseract support other languages?

The Tesseract OCR engine supports multiple languages. To detect characters from a specific language, the language needs to be specified while creating the OCR engine itself. English, German, Spanish, French and Italian languages come embedded with the action so they do not require additional parameters.


2 Answers

Yes, you can, by setting the TESSDATA_PREFIX environment variable, e.g.:

export TESSDATA_PREFIX=/usr/local/share/

Note that the directory path must end in a /.

like image 176
nguyenq Avatar answered Sep 30 '22 14:09

nguyenq


i suggest you don't handle tessdata path by TESSDATA_PREFIX. you can define tessdata path in init tesseract. If you use tesseract.exe in command line use following syntax:

tesseract.exe  --tessdata-dir  tessdataPath  image.png  output  -l  eng

if you use tesseract::TessBaseApi, in api.init() init as following:

api->Init(tessdataPath, language) //api->Init("C:", "eng")
like image 37
Nigje Avatar answered Sep 30 '22 15:09

Nigje