I am using the following code in python:
I am getting the following key values in the dictionary:
'block_num' 'conf' 'level' 'line_num' 'page_num' 'par_num', 'text', 'top', 'width', 'word_num', 'height, 'left'.
What do these key values signify
I tried to find these in the official documentation of tesseract. If you have some links which explain the same please do provide or explain it.
img = cv2.imread('../Image_documents/6.png')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
pprint.pprint(d)
You called an API to get information about text in your image.
The best way to think about response is as a composition of boxes (rectangles) on the image highlighting text areas.
Result-set contains values for multiple different levels.
You can check value of level
key to see what level box belongs to. Bellow are supported values:
Image can contain multiple blocks of the same type and these attributes used to define position of block in list and parents hierarchy - page_num
, block_num
, par_num
, line_num
, word_num
top
, width
, height
, left
values define box shape.
Let's take a look at sample see how it works.
Assume we have picture with 2 words on the same line.
For that picture tesseract returns 6 boxes: 1 for page, 1 for block, 1 for paragraph, 1 for line and 2 for words
This is the data you get:
Code below renders all level boxes on image:
d = pytesseract.image_to_data(image, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top']
[i], d['width'][i], d['height'][i])
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With