Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

Question

I'm trying to extract the data from image using pytesseract. This module has image_to_data, image_to_osd methods. These two methods provides lot of info(TextLineOrder, WritingDirection, ScriptDetection, Orientation etc...) as output.

Below image is the output of image_to_data method. what does values of these columns(level, block_num, par_num, line_num, word_num) meaning?

enter image description here

Output of image_to_osd looks as below. What is the meaning each term in this?

Page number: 0 Orientation in degrees: 0 Rotate: 0 Orientation confidence: 16.47 Script: Latin Script confidence: 4.00

I refered docs but I did not get any info regarding these parameters.

livezingy · Accepted Answer

my_image.jpg

For example, Test the my_image.jpg with image_to_data in the following code, we will get the results like the results.png.

results.png

level = 1/2/3/4/5，the level of current item.
page_num: the page index of the current item. In most instances, a image only has one page.
block_num: the block item of the current item. when tesseract OCR Image, it will split the image into several blocks according the PSM parameters and some rules. The words in a line often in a block.
par_num: The paragraph index of the current item. It is the page analysis results. line_num: The line index of the current item. It is the page analysis results. word_num: The word index in one block.
line_num: The line index of the current item. It is the page analysis results.
word_num: The word index in one block.
left/top/width/height：the top-left coordinate and the width and height of the current word.
conf: the confidence of the current word, the range is -1~100.. The -1 means that there is no text here. The 100 is the highest value.
text: the word ocr results.

The meaning of the results from image_to_osd:

Page number: the page index of the current item. In most instances, a image only has one page.
Orientation in degrees: the clockwise rotation angle of the text in the current image relative to its reading angle, the value range is [0, 270, 180, 90].
Rotate: Record the angle at which the text in the current image is to be converted into readable, relative to the clockwise rotation of the current image, the value range is [0, 270, 180, 90]. Complementary to the [Orientation in degrees] value.
Orientation confidence:Indicates the confidence of the current [Orientation in degrees] and [Rotate] detection values. The greater the confidence, the more credible the test result, but no explanation of its value range has been found so far.
Script: The encoding type of the text in the current picture.
Script confidence: The confidence of the text encoding type in the current image.

from pytesseract import Output import pytesseract import cv2

image = cv2.imread("my_image.jpg")

#swap color channel ordering from BGR (OpenCV’s default) to RGB (compatible with Tesseract and pytesseract).
# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
 
pytesseract.pytesseract.tesseract_cmd = r'C:\mypath	esseract.exe'
custom_config = r'-c tessedit_char_whitelist=0123456789 --psm 6'
results = pytesseract.image_to_data(rgb, output_type=Output.DICT,lang='eng',config=custom_config)
print(results)

Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

Tags:

python

ocr

python-tesseract

Eswar RDS

1 Answers

livezingy

Recent Activity

Donate For Us

Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

Tags:

python

ocr

python-tesseract

Eswar RDS

1 Answers

livezingy

Related questions

Recent Activity

Donate For Us