Is it possible to extract text from specific portion of image using pytesseract

Tags:

I have bounding box(coordinate of rectangle) in an image and want to extract text within that coordinates. How can I use pytesseract to extract text within that coordinates?

I tried copying the image portion to other numpyarray using opencv like

cropped_image = image[y1:y2][x1:x2]

and tried pytesseract.image_to_string(). But the accuracy was very poor. But when I tried original image to pytesseract.image_to_string() it extracted every thing perfectly..

Is there any function to extract specific portion of image using pytesseract?

This image has different sections of information consider I have rectangle coordinates enclosing 'Online food delivering system' how to extract that data in pytessaract?

Please help Thanks in advance

Versions I am using: Tesseract 4.0.0 pytesseract 0.3.0 OpenCv 3.4.3

912

asked Nov 20 '19 07:11

Prem Kumar P

1 Answers

There's no built in function to extract a specific portion of an image using Pytesseract but we can use OpenCV to extract the ROI bounding box then throw this ROI into Pytesseract. We convert the image to grayscale then threshold to obtain a binary image. Assuming you have the desired ROI coordinates, we use Numpy slicing to extract the desired ROI

enter image description here

From here we throw it into Pytesseract to get our result

ONLINE FOOD DELIVERY SYSTEM

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.jpg', 0)
thresh = 255 - cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

x,y,w,h = 37, 625, 309, 28  
ROI = thresh[y:y+h,x:x+w]
data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('ROI', ROI)
cv2.waitKey()

answered Oct 04 '22 01:10

nathancy

Related questions
                            
                                Best practice when add a new unique field to an existing django model
                            
                                Python, list of tuples split into dictionaries
                            
                                get all unicode variations of a latin character
                            
                                How to count consecutive repetitions in a pandas series
                            
                                How to use flask_jwt_extended with blueprints?
                            
                                how to convert perreplica to tensor?
                            
                                How to plot text clusters?
                            
                                Dictionary to Dataframe Error: "If using all scalar values, you must pass an index"
                            
                                Why do these two functions have the same bytecode when disassembled under dis.dis?
                            
                                DataFrame to list of list without change in data type of values
                            
                                cannot import name 'ft2font' from 'matplotlib' on windows10
                            
                                Decreasing the time necessary to enter the coefficients of a matrix
                            
                                How to install the specific version of Python with Anaconda?
                            
                                How can I integrate xgboost in spark? (Python)
                            
                                How to count no of rows in a data frame whose values divisible by 3 or 5?
                            
                                How to animate a line chart in a streamlit page
                            
                                How to popup success message in odoo?
                            
                                SQLAlchemy: Can't reconnect until invalid transaction is rolled back
                            
                                What is causing large jumps in training accuracy and loss between epochs?
                            
                                rllib use custom registered environments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to extract text from specific portion of image using pytesseract

Tags:

python

image

image-processing

opencv

ocr

Prem Kumar P

People also ask

1 Answers

nathancy

Recent Activity

Donate For Us