Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract or any other OCR lib

I'm looking for an explanation / API doc / examples of how to use (and train?) Tesseract in C++, nothing useful on the google Tesseract page, and yet to find something over the web.

Anyone useful sources, experiences would be more than welcome, as I have no idea how to begin with it.

P.S:

  1. I'm open for suggestions on other libraries.
  2. Only FREE libraries
like image 670
snoofkin Avatar asked Nov 30 '10 13:11

snoofkin


1 Answers

I have some experience with Tesseract... a simple google of 'training tesseract' reveals this page: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract where you must choose which version of tesseract you wish to train.. While 3 is the latest version, it's brand new and thus people are still ironing out any issues - im still using version 2.4. Anyways, you'll see there are about 9 steps in training tesseract for a particular 'language' (or what should have been called 'fonts' or 'character-sets'). You could also just use the existing 'eng' language - but it depends on your application. For example, in my application I would have to do the document analysis and take a particular region and want to OCR a 13-character string of numbers - and I needed high accuracy - and I didn't want it reading '5' as 'S' and '0' as 'O' etc, so it was logical to create a particular 'language' of my particular font-set consisting only of the characters 0..9, whereas you might not care if you get extra 'noise

like image 64
Richard Woolf Avatar answered Oct 23 '22 16:10

Richard Woolf