Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Highlighting specific text in an image using python

I want to highlight specific words/sentences in a website screenshot.

Once the screenshot is taken, I extract the text using pytesseract and cv2. That works well and I can get text and data about it.

import pytesseract
import cv2


if __name__ == "__main__":
    img = cv2.imread('test.png')
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    result = pytesseract.image_to_data(img, lang='eng', nice=0, output_type=pytesseract.Output.DICT)
    print(result)

Using the results object I can find needed words and sentences.

The question is how to go back to the image and highlight those word?

Should I be looking at other libraries or there is a way to get pixel values and then highlight the text?

Ideally, I would like to get start and end coordinates of each word, how can that be done?

like image 642
Califlower Avatar asked Jan 09 '19 17:01

Califlower


1 Answers

You can use pytesseract.image_to_boxes method to get the bounding box position of each character identified in your image. You can also use the method to draw bounding box around some specific characters if you want. Below code draws rectangles around my identified image.

import cv2
import pytesseract
import matplotlib.pyplot as plt

filename = 'sf.png'

# read the image and get the dimensions
img = cv2.imread(filename)
h, w, _ = img.shape # assumes color image

# run tesseract, returning the bounding boxes
boxes = pytesseract.image_to_boxes(img)use
print(pytesseract.image_to_string(img)) #print identified text

# draw the bounding boxes on the image
for b in boxes.splitlines():
    b = b.split()
    cv2.rectangle(img, ((int(b[1]), h - int(b[2]))), ((int(b[3]), h - int(b[4]))), (0, 255, 0), 2)

plt.imshow(img)

enter image description here

like image 139
Muthukrishnan Avatar answered Oct 24 '22 01:10

Muthukrishnan