Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one install Tesseract-OCR 3.03 in Ubuntu/Linux distributions?

A friend and I are interested in training the tesseract-OCR engine for a CV project. We tried using some wrappers such as PyTesser and pyocr, but the results are currently not as accurate as we need them to be. As such, we want to try training the tesseract to perform better for our purposes (i.e. identifying text on food labels), but are having some trouble installing the training tools.

What we've tried:

Looking on the google code website, the 'Compiling' page on the tesseract's google code wiki says the training tools are only available on version 3.03. However, the google code 'Downloads' page for tesseract-ocr only has the materials for 3.02. The bottom of the 'Compiling' page also has some comments about installing version 3.03 on Windows and OSX, but no comments yet for Linux users.

There also appears to be some sort of 3.03 source package for Ubuntu but we're not sure how to access it on our computers and the 'Compiling' page says we need to run these commands:

make training
sudo make training-install

We've also found a google group thread about tesseract 3.03 but again it seems like these posts do not include advice for Linux users (unless we missed something during the initial read).

Is this actually a really simple command-line install problem? Or, is there a way train tesseract with 3.02 (which we currently have installed)? Have we been looking at the wrong places for information?

Any advice or links to instructions for installing tesseract-ocr 3.03 for Linux distributions would be greatly appreciated! Thanks.

like image 582
greenteawarrior Avatar asked Jun 13 '14 20:06

greenteawarrior


People also ask

Where is Tesseract installed on Linux?

Installing Tesseract on Debian and Ubuntu: This will install Tesseract under /usr/share/tesseract-ocr/4.00/tessdata. Note: For other Linux distributions, jump to Install Tesseract from Sources. By default, Tesseract will install the English language pack.

How do I install Tesseract from source?

Develop Tesseract Download the latest SW (Software Network https://software-network.org/ ) client from https://software-network.org/client/ . SW is a source package distribution system. Add SW client to PATH. Run sw setup (may require administrator access)

How to install Tesseract OCR on Linux?

How To Install Tesseract Ocr In Linux? You can install Tesseract on Debian or Ubuntu Linux by using apt, which can be found in the screenshot below. tesseract under /usr/share/tesseract-ocr/4 in order to install it. tessdata is priced at 0 dollars. To install Tesseract from sources on other Linux distributions, please follow the instructions below.

How to install tesseract on Debian or Ubuntu Linux?

To install Tesseract on Debian or Ubuntu Linux distribution, use apt as shown in the screenshot below. This will install Tesseract under /usr/share/tesseract-ocr/4.00/tessdata. Note: For other Linux distributions, jump to Install Tesseract from Sources.

How to download and install teract on Linux?

The download method for Tesseract on Ubuntu and Debian is /usr/share/tesseract-ocr/4. The data is priced at $0.50 USD. you want to install teract from sources: If other Linux distributions give you the option, jump to Installing Tesseract from Sources.

How long does it take to learn tesseract?

While training could last for hours or days, recent Tesseract’s versions training may be of days, weeks, or even months, especially if you are looking for a multilingual OCR solution. To install Tesseract on Debian or Ubuntu Linux distribution, use apt as shown in the screenshot below.


2 Answers

Tesseract can directly be installed in Ubuntu 14.04 using

sudo apt-get install tesseract-ocr

I don't have any idea if you can do it in older version of Ubuntu because the repo might be updated in later version of Ubuntu.

like image 178
erluxman Avatar answered Nov 15 '22 20:11

erluxman


I had an aws ubuntu 14.04 instance. when I tried installing Tesseract with

sudo apt-get install tesseract-ocr 

It retuned package not found

But this worked for me.

sudo apt-get update
sudo apt-get install tesseract-ocr
like image 38
Venkatesh Mondi Avatar answered Nov 15 '22 20:11

Venkatesh Mondi