Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract OCR confuses slashed 0 as 8

I have trained tesseract on the terminus font, but no matter what, I can't get it to recognize the 0s. I am using the jTessEditor to create the training tif and boxes. Even when validating, it reads all 0s as 8s. Is there anything I am missing?

Here is an example of the 0 and it reading it as 8:

I use the following parameters:

--psm 10 -c tessedit_char_whitelist=0123456789# --oem 3 -l terminus

like image 918
Vilsol Avatar asked Sep 13 '25 03:09

Vilsol


1 Answers

EasyOCR is lightweight model which is giving a good performance for receipt or PDF conversion. It is giving more accurate results with organized texts like pdf files, receipts, bills. EasyOCR also performs well on noisy images and recognize number better than pytesseract.

code:

!pip install easyocr

 import easyocr

 import cv2

    #Initialzing the ocr
    img = cv2.imread("image path")
    text_reader = easyocr.Reader(['en']) #Initialzing the ocr
    results = text_reader.readtext(img)
    for (bbox, text, prob) in results:
        print(text)
like image 171
Chadou Mohamed Ali Avatar answered Sep 15 '25 17:09

Chadou Mohamed Ali