Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java OCR library recommendations? [duplicate]

Tags:

java

ocr

I need to check a tonne of pictures to see if they have a keyword on them. Can anyone recommend a good, reliable OCR library? I'll happily sacrifice speed for accuracy.

like image 257
Peck3277 Avatar asked Jul 23 '13 11:07

Peck3277


People also ask

Is Tesseract OCR free?

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License.

Is Tesseract an OCR tool?

Tesseract is an open source optical character recognition (OCR) platform. OCR extracts text from images and documents without a text layer and outputs the document into a new searchable text file, PDF, or most other popular formats.

What is Tess4J?

Tess4J is a Java wrapper for the Tesseract APIs that provides OCR support for various image formats like JPEG, GIF, PNG, and BMP.


1 Answers

There is no pure Java OCR libraries that have something to do with accuracy. Depending on your budget you may choose something that is not purely Java, but can be called from Java:

  • If you have plenty of time but zero budget - your choice is Tesseract. It is definetely the best among open source
  • If you have small budget to spend and you only need run this recognition once - Cloud OCR API service would be your best choice. It is based on leading commertial grade OCR engine and offers quite affordable per-project prices. Disclaimer: I work for ABBYY
  • In case you will need to run this recognition as ongoing process forever, then you may think that it is economically more efficient to purchase dedicated conversion software, for example this one, it has API and can be called from Java too. But there are actually lot of alternatives, if you are prepared to invest some budget in licensing.
like image 133
Tomato Avatar answered Oct 03 '22 04:10

Tomato