How to speed up tesseract OCR

Tags:

I'm trying to OCR a lot of documents(I mean in 300k + range a day). At the moment i'm using Tesseract wrapper for .NET and it's all good in quality but the speed is not good enough. The times i get for 20 tasks in parallel scanning of a half page from the same pdf in average are 2,546 second per scan. The code im using:

using (var engine = new TesseractEngine(Tessdata, "eng", EngineMode.TesseractOnly))
        {
            Page page;
            page = engine.Process(image, srcRect);        
            var text = page.GetText();
            return Task.FromResult(text);
        }

The average time i get is after lowering the resolution of image by half and converting it to grayscale. Any ideas to speed up the process? I don't need to have text segmentated, just the text in one line. Should i maybe use something as Matlab for c#?

210

asked Jun 02 '17 07:06

TestzWCh

1 Answers

Currently, you create a new TesseractEngine object for each page you scan. Creating the engine is costly because it reads the 'tessdata' files.

You say you have 20 parallel tasks running. Since the engine cannot process multiple pages at once you will need to create one engine per task and reuse it for all the pages that task processes. You can simply call using (var page = Engine.Process(pix)) to process the next page with an existing engine.

Reusing the engine should significantly improve performance because you'll only have to create 20 engines instead of 300k.

answered Oct 24 '22 05:10

GWigWam

Related questions
                            
                                Compiled expression tree gives different result then the equivalent code
                            
                                Value cannot be null. Parameter name: key (only happens on XAML Designer's design view)
                            
                                Exception occurs performing aggregate queries on x64 platform
                            
                                ConfigureAwait(false) with ADO.Net SQLConnection object
                            
                                Why use wrappers around the actual iterator functions in LINQ extension methods?
                            
                                Deploying C# 7 code to VSTS
                            
                                How to map filter options from ODataQueryOptions to RestRequest
                            
                                Does NUnit re-instantiate classes marked as TestFixtures between each test?
                            
                                Fluent Api Entity Framework core
                            
                                Creating vCard with photo that works on windows and ios - C#
                            
                                How to create array of key/value pair in c#?
                            
                                CRUD operations with EntityFramework using generic type
                            
                                How to make a POST request using Advanced Rest Client
                            
                                Stripe webhook signature failed - Stripe.net
                            
                                How to include dynamically generated files in Visual Studio's publish profile
                            
                                Translating Insert into Select to Entity Framework
                            
                                How to convert FileStreamResult to IFormFile?
                            
                                How to log/get a SQL query auto-generated by Dapper Extensions?
                            
                                How to run Background Tasks in ASP.NET [closed]
                            
                                How to cast JValue to bool?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to speed up tesseract OCR

Tags:

c#

.net

ocr

tesseract

TestzWCh

People also ask

1 Answers

GWigWam

Recent Activity

Donate For Us