Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the key values of the dictionary output of the following code in tesseract signify?

I am using the following code in python:

I am getting the following key values in the dictionary:

'block_num' 'conf'  'level' 'line_num'  'page_num'  'par_num', 'text', 'top', 'width', 'word_num', 'height, 'left'.

What do these key values signify

I tried to find these in the official documentation of tesseract. If you have some links which explain the same please do provide or explain it.

    img = cv2.imread('../Image_documents/6.png')
    d = pytesseract.image_to_data(img, output_type=Output.DICT)
    pprint.pprint(d)
like image 632
Mayank Kumar Avatar asked Jun 21 '19 07:06

Mayank Kumar


1 Answers

You called an API to get information about text in your image.

The best way to think about response is as a composition of boxes (rectangles) on the image highlighting text areas.

Result-set contains values for multiple different levels.

You can check value of level key to see what level box belongs to. Bellow are supported values:

  1. page
  2. block
  3. paragraph
  4. line
  5. word

Image can contain multiple blocks of the same type and these attributes used to define position of block in list and parents hierarchy - page_num, block_num, par_num, line_num, word_num

top, width, height, left values define box shape.

Let's take a look at sample see how it works.

Assume we have picture with 2 words on the same line.

For that picture tesseract returns 6 boxes: 1 for page, 1 for block, 1 for paragraph, 1 for line and 2 for words

This is the data you get:

  • 'level': [1, 2, 3, 4, 5, 5]
  • 'page_num': [1, 1, 1, 1, 1, 1]
  • 'block_num': [0, 1, 1, 1, 1, 1]
  • 'par_num': [0, 0, 1, 1, 1, 1]
  • 'line_num': [0, 0, 0, 1, 1, 1]
  • 'word_num': [0, 0, 0, 0, 1, 2]

Code below renders all level boxes on image:

d = pytesseract.image_to_data(image, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top']
                    [i], d['width'][i], d['height'][i])
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
like image 65
Dmitry Harnitski Avatar answered Nov 15 '22 09:11

Dmitry Harnitski