Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

I'm trying to extract the data from image using pytesseract. This module has image_to_data, image_to_osd methods. These two methods provides lot of info(TextLineOrder, WritingDirection, ScriptDetection, Orientation etc...) as output.

Below image is the output of image_to_data method. what does values of these columns(level, block_num, par_num, line_num, word_num) meaning?

enter image description here

Output of image_to_osd looks as below. What is the meaning each term in this?

Page number: 0 Orientation in degrees: 0 Rotate: 0 Orientation confidence: 16.47 Script: Latin Script confidence: 4.00

I refered docs but I did not get any info regarding these parameters.

like image 727
Eswar RDS Avatar asked Jan 25 '23 02:01

Eswar RDS


1 Answers

my_image.jpg

For example, Test the my_image.jpg with image_to_data in the following code, we will get the results like the results.png.

results.png

  • level = 1/2/3/4/5,the level of current item.

  • page_num: the page index of the current item. In most instances, a image only has one page.

  • block_num: the block item of the current item. when tesseract OCR Image, it will split the image into several blocks according the PSM parameters and some rules. The words in a line often in a block.

  • par_num: The paragraph index of the current item. It is the page analysis results. line_num: The line index of the current item. It is the page analysis results. word_num: The word index in one block.

  • line_num: The line index of the current item. It is the page analysis results.

  • word_num: The word index in one block.

  • left/top/width/height:the top-left coordinate and the width and height of the current word.

  • conf: the confidence of the current word, the range is -1~100.. The -1 means that there is no text here. The 100 is the highest value.

  • text: the word ocr results.

The meaning of the results from image_to_osd:

  • Page number: the page index of the current item. In most instances, a image only has one page.

  • Orientation in degrees: the clockwise rotation angle of the text in the current image relative to its reading angle, the value range is [0, 270, 180, 90].

  • Rotate: Record the angle at which the text in the current image is to be converted into readable, relative to the clockwise rotation of the current image, the value range is [0, 270, 180, 90]. Complementary to the [Orientation in degrees] value.

  • Orientation confidence:Indicates the confidence of the current [Orientation in degrees] and [Rotate] detection values. The greater the confidence, the more credible the test result, but no explanation of its value range has been found so far.

  • Script: The encoding type of the text in the current picture.

  • Script confidence: The confidence of the text encoding type in the current image.

from pytesseract import Output import pytesseract import cv2

image = cv2.imread("my_image.jpg")

#swap color channel ordering from BGR (OpenCV’s default) to RGB (compatible with Tesseract and pytesseract).
# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
 
pytesseract.pytesseract.tesseract_cmd = r'C:\mypath\tesseract.exe'
custom_config = r'-c tessedit_char_whitelist=0123456789 --psm 6'
results = pytesseract.image_to_data(rgb, output_type=Output.DICT,lang='eng',config=custom_config)
print(results)
like image 155
livezingy Avatar answered Feb 13 '23 11:02

livezingy