Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get Tesseract confidence levels in python or command window?

How can we get the confidence levels after OCR of an image using tesseract 3.05 in windows? I am calling tesseract from python using subprocess commands:

retcode = subprocess.call("tesseract -l eng myImage.png txt -psm 6" , stdin=None, stdout=False, stderr=None, shell=False)

like image 788
c.Parsi Avatar asked Oct 17 '25 14:10

c.Parsi


1 Answers

This is the wrapper that you need: https://pypi.python.org/pypi/tesserocr/2.0.0 . Also there are tons of python wrapper out there, but this library is the closest wrapper that nearly cover all of C++ API.

Example:

from PIL import Image
from tesserocr import PyTessBaseAPI

image = Image.open('/usr/src/tesseract/testing/phototest.tif')
with PyTessBaseAPI() as api:
    api.SetImage(image)
    boxes = api.GetComponentImages(RIL.TEXTLINE, True)
    print 'Found {} textline image components.'.format(len(boxes))
    for i, (im, box, _, _) in enumerate(boxes):
        # im is a PIL image object
        # box is a dict with x, y, w and h keys
        api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
        ocrResult = api.GetUTF8Text()
        conf = api.MeanTextConf()
        print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
               "confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
like image 64
Vu Gia Truong Avatar answered Oct 21 '25 20:10

Vu Gia Truong



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!