I'm looking for an open source OCR library that runs on Linux. I need this to work for PNGs and PDFs. Mostly I would like to interface this library from java or ruby. Any idea if there is anything available?
Regards.
There are programs available to solve this problem, and many of them are both free and open source. Optical character recognition (OCR) software allows you to convert non-editable files, like PDF files or images, into editable text. There are multiple OCR tools on the market.
There are many GUI clients built on the Tesseract project. If you are a Windows user then gImageReader is the best OCR software that you can use.
Tesseract. Tesseract is a free and open source command line OCR engine that was developed at Hewlett-Packard in the mid 80s, and has been maintained by Google since 2006. It is well documented. Tesseract is written in C/C++.
Tesseract is a very good OCR engine: https://github.com/tesseract-ocr/tesseract
The project has been launched by HP Labs and is now continued and sponsored by Google (for Google Books !). It is released under the Apache license, and it runs on Linux. It uses Tiff or PNGs files ; for PDFs, you will need to convert to one of these formats. I suppose that there is no binding so you should invoke this software as a subprogram...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With