tesseract reading values from a table

Question

My question follows this post about extracting data from a table in an image using OCR.

I'm using tesseract to convert a table image to text. This works well except that the format of the table is not preserved. One solution is to replace the columns with some letters tesseract would recognize and fool it into taking the table just as some text.

Here is an example of a table without columns enter image description here

I use the following code to draw the columns of "QQ"

im=Image.open("file.png")
draw  = ImageDraw.Draw(im)
font=ImageFont.truetype("/usr/share/fonts/gnu-free/FreeSerifBold.ttf",12)
by = font.getsize("S")[1]
col = [240,480]
px = []

for y in range(0,im.size[1],by):
    for x in col:
        draw.text((x,y),"QQ",font=font,fill=0)

im.save("res-file.png")
im.show()

which give me the following image enter image description here

The problem is that tesseract does even recognize the QQ. I write the QQ columns in a blank page as well and tesseract didn't recognize it.

Is there a way to convert this table in png format to text using tesseract? Is there something that escaped me?

DJJ · Accepted Answer

The main issue was with the vertical spacing of QQ. by adding some vertical spacing and resizing the image I got acceptable results

im=Image.open("file.png")
draw  = ImageDraw.Draw(im)
font=ImageFont.truetype("/usr/share/fonts/gnu-free/FreeSerifBold.ttf",12)
by = font.getsize("S")[1]
col = [240,480]
px = []

for y in range(0,im.size[1],by+5):
    for x in col:
        draw.text((x,y),"QQ",font=font,fill=0)
im=im.resize((im.size[0]*2,im.size[1]*2))
im.save("res-file.png")
im.show()

Here is the text obtained after tesseract res-file.png outputfile

8888
8888

Shop §Date open§Town
Mike §2007-Nov-09§sourisvi||e
Abbibas §2009-Jan-05§magic city
Iuma $2009-Jan-05§vani||a
Beboot §2009-Mar-31§at|antis
bascics $2008-Jun-27§o|d york

tesseract reading values from a table

Tags:

python

tesseract

python-tesseract

DJJ

1 Answers

DJJ

Recent Activity

Donate For Us

tesseract reading values from a table

Tags:

python

tesseract

python-tesseract

DJJ

1 Answers

DJJ

Related questions

Recent Activity

Donate For Us