Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get character position in pytesseract

I am trying to get character position of image files using pytesseract library .

import pytesseract
from PIL import Image
print pytesseract.image_to_string(Image.open('5.png'))

Is there any library for getting each position of character

like image 261
Chandy Alex Avatar asked Aug 24 '15 05:08

Chandy Alex


2 Answers

Did you try use pytesseract.image_to_data()?

data = pytesseract.image_to_data(img, output_type='dict')
boxes = len(data['level'])
for i in range(boxes ):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    #Draw box        
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
like image 57
Sang9xpro Avatar answered Oct 24 '22 00:10

Sang9xpro


Using pytesseract doesn't seem the best idea to have the position but you can do this :

from pytesseract import pytesseract
pytesseract.run_tesseract('image.png', 'output', lang=None, boxes=False, config="hocr")
like image 1
el Josso Avatar answered Oct 24 '22 01:10

el Josso