Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open source OCR [closed]

I'm looking for an open source OCR library that runs on Linux. I need this to work for PNGs and PDFs. Mostly I would like to interface this library from java or ruby. Any idea if there is anything available?

Regards.

like image 694
Chris Avatar asked Mar 01 '11 07:03

Chris


People also ask

Is OCR open source?

There are programs available to solve this problem, and many of them are both free and open source. Optical character recognition (OCR) software allows you to convert non-editable files, like PDF files or images, into editable text. There are multiple OCR tools on the market.

Which OCR is better than Tesseract?

There are many GUI clients built on the Tesseract project. If you are a Windows user then gImageReader is the best OCR software that you can use.

Is Google OCR open source?

Tesseract. Tesseract is a free and open source command line OCR engine that was developed at Hewlett-Packard in the mid 80s, and has been maintained by Google since 2006. It is well documented. Tesseract is written in C/C++.


1 Answers

Tesseract is a very good OCR engine: https://github.com/tesseract-ocr/tesseract

The project has been launched by HP Labs and is now continued and sponsored by Google (for Google Books !). It is released under the Apache license, and it runs on Linux. It uses Tiff or PNGs files ; for PDFs, you will need to convert to one of these formats. I suppose that there is no binding so you should invoke this software as a subprogram...

like image 118
olivierlemasle Avatar answered Sep 22 '22 12:09

olivierlemasle