I need to check a tonne of pictures to see if they have a keyword on them. Can anyone recommend a good, reliable OCR library? I'll happily sacrifice speed for accuracy.
Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License.
Tesseract is an open source optical character recognition (OCR) platform. OCR extracts text from images and documents without a text layer and outputs the document into a new searchable text file, PDF, or most other popular formats.
Tess4J is a Java wrapper for the Tesseract APIs that provides OCR support for various image formats like JPEG, GIF, PNG, and BMP.
There is no pure Java OCR libraries that have something to do with accuracy. Depending on your budget you may choose something that is not purely Java, but can be called from Java:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With