Tesseract OCR confuses slashed 0 as 8

Question

I have trained tesseract on the terminus font, but no matter what, I can't get it to recognize the 0s. I am using the jTessEditor to create the training tif and boxes. Even when validating, it reads all 0s as 8s. Is there anything I am missing?

Here is an example of the 0 and it reading it as 8:

I use the following parameters:

--psm 10 -c tessedit_char_whitelist=0123456789# --oem 3 -l terminus

Chadou Mohamed Ali · Accepted Answer

EasyOCR is lightweight model which is giving a good performance for receipt or PDF conversion. It is giving more accurate results with organized texts like pdf files, receipts, bills. EasyOCR also performs well on noisy images and recognize number better than pytesseract.

code:

!pip install easyocr

 import easyocr

 import cv2

    #Initialzing the ocr
    img = cv2.imread("image path")
    text_reader = easyocr.Reader(['en']) #Initialzing the ocr
    results = text_reader.readtext(img)
    for (bbox, text, prob) in results:
        print(text)

Tesseract OCR confuses slashed 0 as 8

Tags:

python

ocr

tesseract

Vilsol

1 Answers

Chadou Mohamed Ali

Recent Activity

Donate For Us

Tesseract OCR confuses slashed 0 as 8

Tags:

python

ocr

tesseract

Vilsol

1 Answers

Chadou Mohamed Ali

Related questions

Recent Activity

Donate For Us