For the past 3 months I've been trying to train the Tesseract
With identifying a collection of images I've had, due a real lack
of proper documentation, and very high level of complexity I'm starting to
give up on Tesseract as a solution.
I'm looking for an alternative, which would be relatively pain free
for training, I'm not looking to rediscover the wheel here.
If there isn't anything free, I guess paid solutions would
have to do (nothing above 200$)
Based on your comment, all you need is to scan relatively small amount of documents with almost 100% accuracy and your budget is about 200$
Well, the answer is simple then. You don't need any programming solution. Just buy quality commercial OCR product, f.e. ABBYY FineReader (disclaimer: I work for ABBYY). It has different prices in different regions, but I guess it is somewhere in about your budget.
Commercial desktop OCR product will provide you out-of-the box almost 100% accuracy on typical languages. Also they have convenient manual verification tools to fix all remaining errors. Typically they support whole variety of modern fonts, but if your font is not trivial, they do have font training utility for that.
I do think that is optimal solution for you.
UPDATE: Linux platform. Unfortunately, there is almost no choice of high quality OCR products for Linux, sorry. The only one I know is from ABBYY: http://ocr4linux.com/en:start but it does not have UI, verification and font training. But at least you can give it a try to see if it will give you good enough accuracy as it is, which may happen to be the case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With