Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytesseract set character whitelist

Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following:

img = Image.open('test.jpg')
result = pytesseract.image_to_string(img, config='-psm 6')

I'm getting other characters like / for a 1 so I would like to limit the options of possible characters.

like image 433
Minato10 Avatar asked Apr 30 '17 10:04

Minato10


1 Answers

You can accomplish that with the below line. Or you can setup the config file for tesseract to do the same thing Limit characters tesseract is looking for

pytesseract.image_to_string(question_img, config="-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz -psm 6")

I am sure there are other ways to get it work, but this is what worked for me.

like image 176
James Vaughn Avatar answered Sep 21 '22 14:09

James Vaughn