A friend and I are interested in training the tesseract-OCR engine for a CV project. We tried using some wrappers such as PyTesser and pyocr, but the results are currently not as accurate as we need them to be. As such, we want to try training the tesseract to perform better for our purposes (i.e. identifying text on food labels), but are having some trouble installing the training tools.
What we've tried:
Looking on the google code website, the 'Compiling' page on the tesseract's google code wiki says the training tools are only available on version 3.03. However, the google code 'Downloads' page for tesseract-ocr only has the materials for 3.02. The bottom of the 'Compiling' page also has some comments about installing version 3.03 on Windows and OSX, but no comments yet for Linux users.
There also appears to be some sort of 3.03 source package for Ubuntu but we're not sure how to access it on our computers and the 'Compiling' page says we need to run these commands:
make training
sudo make training-install
We've also found a google group thread about tesseract 3.03 but again it seems like these posts do not include advice for Linux users (unless we missed something during the initial read).
Is this actually a really simple command-line install problem? Or, is there a way train tesseract with 3.02 (which we currently have installed)? Have we been looking at the wrong places for information?
Any advice or links to instructions for installing tesseract-ocr 3.03 for Linux distributions would be greatly appreciated! Thanks.
Installing Tesseract on Debian and Ubuntu: This will install Tesseract under /usr/share/tesseract-ocr/4.00/tessdata. Note: For other Linux distributions, jump to Install Tesseract from Sources. By default, Tesseract will install the English language pack.
Develop Tesseract Download the latest SW (Software Network https://software-network.org/ ) client from https://software-network.org/client/ . SW is a source package distribution system. Add SW client to PATH. Run sw setup (may require administrator access)
How To Install Tesseract Ocr In Linux? You can install Tesseract on Debian or Ubuntu Linux by using apt, which can be found in the screenshot below. tesseract under /usr/share/tesseract-ocr/4 in order to install it. tessdata is priced at 0 dollars. To install Tesseract from sources on other Linux distributions, please follow the instructions below.
To install Tesseract on Debian or Ubuntu Linux distribution, use apt as shown in the screenshot below. This will install Tesseract under /usr/share/tesseract-ocr/4.00/tessdata. Note: For other Linux distributions, jump to Install Tesseract from Sources.
The download method for Tesseract on Ubuntu and Debian is /usr/share/tesseract-ocr/4. The data is priced at $0.50 USD. you want to install teract from sources: If other Linux distributions give you the option, jump to Installing Tesseract from Sources.
While training could last for hours or days, recent Tesseract’s versions training may be of days, weeks, or even months, especially if you are looking for a multilingual OCR solution. To install Tesseract on Debian or Ubuntu Linux distribution, use apt as shown in the screenshot below.
Tesseract can directly be installed in Ubuntu 14.04 using
sudo apt-get install tesseract-ocr
I don't have any idea if you can do it in older version of Ubuntu because the repo might be updated in later version of Ubuntu.
I had an aws ubuntu 14.04 instance. when I tried installing Tesseract with
sudo apt-get install tesseract-ocr
It retuned package not found
But this worked for me.
sudo apt-get update
sudo apt-get install tesseract-ocr
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With