Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Failed to initialise tesseract engine. Can't find correct version

Tags:

c#

ocr

I have an issue initializing tessaract engine with the following exception:

Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details.

I did research and noticed that it is important to download the specified version of language files, and in my case it should be 3.0.2 (I think). After I took a look in Visual Studio I noticed that the installed .NET wrapper is version 3.0.2, the native files loaded into my project are named libtesseract304.dll (which I think is version 3.4), in packages.config file is cited version 3.0.2.0

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Tesseract" version="3.0.2.0" targetFramework="net40" />
</packages>

...and finally, the only version for language packs I can find in gitHub is 3.4

Can anyone tell me where can I find a language pack version 3.0.2 or .NET wrapper version 3.4 or just to point a way to solve this issue?

I'm using Visual Studio 2012 on Windows 7 Service Pack 1

like image 300
Tihomir Blagoev Avatar asked Jul 25 '16 11:07

Tihomir Blagoev


1 Answers

First, make sure the DLLs inside the x64 and x86 folders are set to "Copy Always" (or Copy if newer). These DLLs are inserted in the project when you install the Tesseract package via NuGet.

Also, make sure the files inside the tessdata folder are set to "Copy Always" as well.

This will make these folders with the respective files be copied to the executing assembly folder (e.g bin/Debug or bin).

Finally, make sure you are passing the proper path when instantiating the TesseractEngine class. I usually have a class library containing the code, then I reuse it in a Console application when developing, afterwards I reuse in an ASP.NET Web Application. So one way of making sure the path will be correct regardless of which project is being executed is:

var path = Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);
path = Path.Combine(path, "tessdata");
path = path.Replace("file:\\", "");
using (var engine = new TesseractEngine(path, "eng", EngineMode.Default))
{
    engine.SetVariable("tessedit_char_whitelist", "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ");
    engine.SetVariable("tessedit_unrej_any_wd", true);

    using (var page = engine.Process(bitmap, PageSegMode.SingleLine))
        res = page.GetText();
}

Visual Studio Image showing the Copy Always option

like image 110
Alisson Reinaldo Silva Avatar answered Oct 20 '22 07:10

Alisson Reinaldo Silva