Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract OCR ignoring "-"

In my application, i am reading text from an image that contains numbers and alphabets separated with -

For example 1-TT88TY5-AD5G

However, Tesseract is ignoring - and giving me 1TT88TY5AD5G..

How to force it to read hyphens too..

Here's my initial code for it..

Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];
                       [tesseract setVariableValue:@"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" forKey:@"tessedit_char_whitelist"];
like image 594
Shradha Avatar asked May 24 '26 17:05

Shradha


1 Answers

I'm pretty much guessing here since I haven't used Tesseract, but shouldn't the - be in the whitelist?

[tesseract setVariableValue:@"-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" forKey:@"tessedit_char_whitelist"];
                              ^
like image 190
James Webster Avatar answered May 26 '26 07:05

James Webster



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!