Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain the trust-rate of an ocr output?

Is there a way to get the trust rate of an OCR output that is produced by Pytesseract ? What I mean by the trust rate is the correctness percentage of the OCR output.

Example:

text = pytesseract.image_to_string(editedImage) 

For this text string I also want to show the trust rate if it is possible.

Edit: I tried the image_to_data but I got an error

print(pytesseract.image_to_data(Image.open('test.png')))



Traceback (most recent call last):
  File "/usr/lib/python3.4/tkinter/__init__.py", line 1536, in __call__
    return self.func(*args)
  File "/home/caner/Desktop/Met/OCR-METv3/venv/tkgui.py", line 192, in convert
    print(pytesseract.image_to_data(Image.open('test.png')))
  File "/home/caner/Desktop/Met/OCR-METv3/venv/lib/python3.4/site-packages/pytesseract/pytesseract.py", line 232, in image_to_data
    return run_and_get_output(image, 'tsv', lang, config, nice)
  File "/home/caner/Desktop/Met/OCR-METv3/venv/lib/python3.4/site-packages/pytesseract/pytesseract.py", line 142, in run_and_get_output
    with open(filename, 'rb') as output_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_2mxczh8n_out.tsv' 
like image 526
caner karagüler Avatar asked Oct 25 '25 23:10

caner karagüler


1 Answers

My guess is that you're referring to confidence with trust rate. There is some info regarding this on the repo of the pytesseract module here.

Functions

  • image_to_string Returns the result of a Tesseract OCR run on the image to string
  • image_to_boxes Returns result containing recognized characters and their box boundaries
  • image_to_data Returns result containing box boundaries, confidences, and other information. Requires Tesseract 3.05+. For more information, please check the Tesseract TSV documentation

I think what you're looking for is the image_to_data function.

like image 76
neznidalibor Avatar answered Oct 28 '25 13:10

neznidalibor