Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract 3 (OCR) - .NET Wrapper

Tags:

http://code.google.com/p/tesseractdotnet/

I am having a problem getting Tesseract to work in my Visual Studio 2010 projects. I have tried console and winforms and both have the same outcome. I have come across a dll by someone else who claims to have it working in VS2010:

http://code.google.com/p/tesseractdotnet/issues/detail?id=1

I am adding a reference to the dll which can be found in the attached to post 64 from the website above. Every time I build my project I get an AccessViolationException saying that an attempt was made to read or write protected memory.

public void StartOCR()
{
    const string language = "eng";
    const string TessractData = @"C:\Users\Joe\Desktop\tessdata\";

    using (TesseractProcessor processor = new TesseractProcessor())
    {
        using (Bitmap bmp = Bitmap.FromFile(fileName) as Bitmap)
        {
            if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT))
            {
                string text = processor.Recognize(bmp);
            }
        }
    }
}

The access violation exception always points to if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT)). I've seen a few suggestions to make sure the solution platform is set to x86 in the configuration manager and that the tessdata folder location is finished with trailing slash, to no avail. Any ideas?

like image 567
Jpin Avatar asked Apr 08 '12 22:04

Jpin


People also ask

What is Tesseract OCR in C#?

What is C# Tesseract OCR? The Tesseract optical character recognition engine (OCR) is a technology used to convert scanned paper documents, PDF files, and images into searchable text data.

Is Tesseract OCR free?

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License.

Does Tesseract have a GUI?

1. GUIs. Easy to use Tesseract frontend with no install needed, including PDF processing, and Google Book downloading, and extra pre- and post-processing capabilities. OCR powered screen-capture tool to capture information instead of images.


2 Answers

It appeared to be the contents of the tessdata folder that was causing the problem. Obtained the tessdata folder from the first link and all is now working.

like image 117
Jpin Avatar answered Oct 01 '22 17:10

Jpin


I have just completed a project with tesseract engine 3. i think, there is a bug in the engine, that need to be rectified. What i Did to remove "AccessViolationError" is, add "\tessdata" to the real tessdata directory string. I don't know why, but the engine seems to be truncating the innermost directory in the Tessdata path.

Just made Full OCR package (Dlls+Tessdata(english)) that works with .net framework 4.

like image 44
Umar Hassan Avatar answered Oct 01 '22 16:10

Umar Hassan