I have trained tesseract on the terminus font, but no matter what, I can't get it to recognize the 0s. I am using the jTessEditor to create the training tif and boxes. Even when validating, it reads all 0s as 8s. Is there anything I am missing?
Here is an example of the 0 and it reading it as 8:
I use the following parameters:
--psm 10 -c tessedit_char_whitelist=0123456789# --oem 3 -l terminus
EasyOCR is lightweight model which is giving a good performance for receipt or PDF conversion. It is giving more accurate results with organized texts like pdf files, receipts, bills. EasyOCR also performs well on noisy images and recognize number better than pytesseract.
code:
!pip install easyocr
import easyocr
import cv2
#Initialzing the ocr
img = cv2.imread("image path")
text_reader = easyocr.Reader(['en']) #Initialzing the ocr
results = text_reader.readtext(img)
for (bbox, text, prob) in results:
print(text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With