Hi Can you anyone give me a simple example of testing Tesseract OCR preferably in C#.
I tried the demo found here. I download the English dataset and unzipped in C drive. and modified the code as followings:
string path = @"C:\pic\mytext.jpg"; Bitmap image = new Bitmap(path); Tesseract ocr = new Tesseract(); ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only ocr.Init(@"C:\tessdata\", "eng", false); // To use correct tessdata List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty); foreach (tessnet2.Word word in result) Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
Unfortunately the code doesn't work. the program dies at "ocr.Init(..." line. I couldn't even get an exception even using try-catch.
I was able to run the vietocr! but that is a very large project for me to follow. i need a simple example like above.
While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.
Create a Python tesseract script Create a project folder and add a new main.py file inside that folder. Once the application gives access to PDF files, its content will be extracted in the form of images. These images will then be processed to extract the text.
Tesseract is performing well for high-resolution images. Certain morphological operations such as dilation, erosion, OTSU binarization can help increase pytesseract performance. EasyOCR is lightweight model which is giving a good performance for receipt or PDF conversion.
Ok. I found the solution here tessnet2 fails to load the Ans given by Adam
Apparently i was using wrong version of tessdata. I was following the the source page instruction intuitively and that caused the problem.
it says
Quick Tessnet2 usage
Download binary here, add a reference of the assembly Tessnet2.dll to your .NET project.
Download language data definition file here and put it in tessdata directory. Tessdata directory and your exe must be in the same directory.
After you download the binary, when you follow the link to download the language file, there are many language files. but none of them are right version. you need to select all version and go to next page for correct version (tesseract-2.00.eng)! They should either update download binary link to version 3 or put the the version 2 language file on the first page. Or at least bold mention the fact that this version issue is a big deal!
Anyway I found it. Thanks everyone.
A simple example of testing Tesseract OCR in C#:
public static string GetText(Bitmap imgsource) { var ocrtext = string.Empty; using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default)) { using (var img = PixConverter.ToPix(imgsource)) { using (var page = engine.Process(img)) { ocrtext = page.GetText(); } } } return ocrtext; }
Info: The tessdata folder must exist in the repository: bin\Debug\
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With