Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Pytesseract and Tesserocr?

I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?

like image 561
Soufiane S Avatar asked Feb 19 '19 08:02

Soufiane S


People also ask

Is Tesserocr faster than Pytesseract?

From my experience Tesserocr is much faster than Pytesseract. Tesserocr is a python wrapper aroung the Tesseract C++ API. Whereas pytesseract is a wrapper the tesseract-ocr CLI.

Is Pytesseract and Tesseract the same?

Pytesseract or Python-tesseract is an OCR tool for python that also serves as a wrapper for the Tesseract-OCR Engine. It can read and recognize text in images and is commonly used in python ocr image to text use cases.

Is Pytesseract slow?

Also the code is able to detect text, its just extremely slow. Recognising text from images is very cpu intensive - as a first step I would look at binarizing the input that is passed into image_to_string - this can speed up text recognition significantly.

Is Pytesseract free?

It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006. Tesseract 4.1. 1 reading an image.


1 Answers

From my experience Tesserocr is much faster than Pytesseract.

Tesserocr is a python wrapper aroung the Tesseract C++ API. Whereas pytesseract is a wrapper the tesseract-ocr CLI.

Therefore with Tesserocr you can load the model in the beginning or your program, and run the model seperately (for example in loops to process videos). With pytesseract, each time you call image_to_string function, it loads the model and process the image, therefore being slower for video processing.

To install tesserocr I just typed in the terminal pip install tesserocr.

To use tesserocr

import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()

To install pytesseract : pip install pytesseract.

To run it :

import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)  
like image 113
Houssam ASSANY Avatar answered Oct 05 '22 00:10

Houssam ASSANY