What OCR options exist beyond Tesseract? [closed]

Tags:

I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border, but have tried adding one with imagemagick with no ocr advantage); they range from 2 chars to 5 and are a pretty reliable font, however the characters are variable enough that simply using an image size checksum or such is not going to work.

What options exist for OCR besides sticking with Tesseract or doing a complete custom training of it? Also, it would be VERY helpful if this were compatible with Heroku style hosting (at least where I can compile the bins and shove them over).

776

asked Mar 13 '12 19:03

ylluminate

1 Answers

I have successfully used GOCR in the past for small image OCR. I would say accuracy was around 85%, after getting the grayscale options set properly, on fairly regular fonts. It fails miserably when the fonts get complicated and has trouble with multiline layouts.

Also have a look at Ocropus, which is maintained by Google. Its related to Tesseract, but from what I understand, its OCR engine is different. With just the default models included, it achieves near 99% accuracy on high-quality images, handles layout pretty well and provides HTML output with information concerning formatting and lines. However, in my experience, its accuracy is very low when the image quality is not good enough. That being said, training is relatively simple and you might want to give it a try.

Both of them are easily callable from the command line. GOCR usage is very straightforward; just type gocr -h and you should have all the information you need. Ocropus is a bit more tricky; here's a usage example, in Ruby:

require 'fileutils' tmp = 'directory' file = 'file.png'  `ocropus book2pages #{tmp}/out #{file}` `ocropus pages2lines #{tmp}/out` `ocropus lines2fsts #{tmp}/out` `ocropus buildhtml #{tmp}/out > #{tmp}/output.html`  text = File.read("#{tmp}/output.html") FileUtils.rm_rf(tmp)

121

answered Sep 22 '22 11:09

user2398029

Related questions
                            
                                Laravel - Form Input - Multiple select for a one to many relationship
                            
                                SimpleXMLElement to PHP Array [duplicate]
                            
                                Fetching Alexa data [closed]
                            
                                header location not working in my php code
                            
                                Simple PHP isset test
                            
                                Trim &nbsp; with PHP
                            
                                Fabrik form submission issue, getting JSON response instead of thank you page
                            
                                PHP auto-prepend buggy after out of memory error
                            
                                Deeplink to Facebook App (using fb: protocol) not working from Facebook in-app browser
                            
                                Lex and Yacc in PHP [closed]
                            
                                Why is my 301 Redirect taking so long?
                            
                                Create a zip file using PHP class ZipArchive without writing the file to disk?
                            
                                Parameter type covariance in specializations
                            
                                PHP - best way to initialize an object with a large number of parameters and default values
                            
                                How can I implement single sign-on (SSO) using Microsoft AD for an internal PHP app?
                            
                                Is an X-Requested-With header server check sufficient to protect against a CSRF for an ajax-driven application?
                            
                                How to get client's timezone? [duplicate]
                            
                                How can I test PHP site security for most common security flaws?
                            
                                Architecture of a PHP app on Amazon EC2
                            
                                The program can't start because api-ms-win-crt-runtime-l1-1-0.dll is missing while starting Apache server on my computer [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What OCR options exist beyond Tesseract? [closed]

Tags:

python

php

ruby

ocr

tesseract

ylluminate

People also ask

1 Answers

user2398029

Recent Activity

Donate For Us