I am facing a confusing problem trying to download image and open it with BytesIO in order to extract text from it using PIL & pytesseract. <pre class="prettyprint"><code>>>> response = requests.get('http://abc/images/im.jpg') >>> img = Image.open(BytesIO(response.content)) >>> img <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=217x16 at 0x7FDAD185CB38> >>> text = pytesseract.image_to_string(img) >>> text '' </code></pre> Here it gives an empty string. However if i save the image and then open it again with pytesseract, it gives the right result. <pre class="prettyprint"><code>>>> img.save('im1.jpg') >>> im = Image.open('im1.jpg') >>> pytesseract.image_to_string(im) 'The right text' </code></pre> And just to confirm, both give same size. <pre class="prettyprint"><code>>>> im.size (217, 16) >>> img.size (217, 16) </code></pre> What can be the problem? Is it necessary to save the image or am I doing something wrong?

You seem to have a problem which I can't reproduce. So to diagnose your problem, if there is any, were much more details necessary, BUT instead of asking for details I just assume (so my overall experience) that in the process of giving the details your problem will vanish and can't be reproduced. This way is this answer a solution to your problem. In case it is not, let know if you need further assistance. At least you can be sure, that you are generally right because of what you have experienced and did nothing apparently wrong. Here the FULL code (your question is missing hints which modules are necessary) AND the image is actually ONLINE so anyone else could also test if the code works or not (you didn't provide an online existing image in your question): <pre class="prettyprint"><code>import io import requests import pytesseract from PIL import Image response = requests.get("http://www.teamjimmyjoe.com/wp-content/uploads/2014/09/Classic-Best-Funny-Text-Messages-earthquake-titties.jpg") # print( type(response) ) # <class 'requests.models.Response'> img = Image.open(io.BytesIO(response.content)) # print( type(img) ) # <class 'PIL.JpegImagePlugin.JpegImageFile'> text = pytesseract.image_to_string(img) print( text ) </code></pre> Here the pytesseract output: <pre class="prettyprint"><code>Hey! I just saw on CNN there was an earthquake near you. Are you ok? ‘ Yes! We‘re all line! What did it rate on the titty scale? ‘ Well they only jiggled a little bit, so probably not that high. HAHAHAHAHAHA I LOVE YOU Richter scale. My phone is l a 12 yr old boy. </code></pre> My system: Linux Mint 18.1 with Python 3.6

Opening Image file from url with PIL for text recognition with pytesseract

Tags:

python-3.x

image

request

python-imaging-library

ocr

I am facing a confusing problem trying to download image and open it with BytesIO in order to extract text from it using PIL & pytesseract.

Click to copy

>>> response = requests.get('http://abc/images/im.jpg')
>>> img = Image.open(BytesIO(response.content))
>>> img
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=217x16 at 0x7FDAD185CB38>
>>> text = pytesseract.image_to_string(img)
>>> text
''

Here it gives an empty string.

However if i save the image and then open it again with pytesseract, it gives the right result.

Click to copy

>>> img.save('im1.jpg')
>>> im = Image.open('im1.jpg')
>>> pytesseract.image_to_string(im)
'The right text'

And just to confirm, both give same size.

Click to copy

>>> im.size
(217, 16)
>>> img.size
(217, 16)

What can be the problem? Is it necessary to save the image or am I doing something wrong?

857

asked Apr 13 '17 23:04

sprksh

1 Answers

You seem to have a problem which I can't reproduce. So to diagnose your problem, if there is any, were much more details necessary, BUT instead of asking for details I just assume (so my overall experience) that in the process of giving the details your problem will vanish and can't be reproduced. This way is this answer a solution to your problem.

In case it is not, let know if you need further assistance. At least you can be sure, that you are generally right because of what you have experienced and did nothing apparently wrong.

Here the FULL code (your question is missing hints which modules are necessary) AND the image is actually ONLINE so anyone else could also test if the code works or not (you didn't provide an online existing image in your question):

Click to copy

import io
import requests
import pytesseract
from PIL import Image
response = requests.get("http://www.teamjimmyjoe.com/wp-content/uploads/2014/09/Classic-Best-Funny-Text-Messages-earthquake-titties.jpg")
# print( type(response) ) # <class 'requests.models.Response'>
img = Image.open(io.BytesIO(response.content))
# print( type(img) ) # <class 'PIL.JpegImagePlugin.JpegImageFile'>
text = pytesseract.image_to_string(img)
print( text )

Here the pytesseract output:

Click to copy

Hey! I just saw on CNN
there was an earthquake
near you. Are you ok?






‘ Yes! We‘re all line!

What did it rate on the titty
scale?
‘ Well they only jiggled a

little bit, so probably not

that high.
HAHAHAHAHAHA I LOVE
YOU
Richter scale. My phone is l
a 12 yr old boy.

My system: Linux Mint 18.1 with Python 3.6

124

answered Nov 03 '22 01:11

Claudio

Related questions
                            
                                Android - Make a portion of an image repeatable in android?
                            
                                jQuery check if image already loaded before binding a .load() event [duplicate]
                            
                                What is the best practice for setting image size? css or attributes? [closed]
                            
                                How to enable php_fileinfo extension in PHP?
                            
                                How can I create a PNG image file from a list of pixel values in Python?
                            
                                Cropping very large fits files using specified boundaries
                            
                                How can I write pure HTML like <img> tag in ERB?
                            
                                Android - When attempting to add an image creates a blank image
                            
                                Image.Save does not save image data to file
                            
                                How to add an image in TCPDF
                            
                                Add background image to 3d plot
                            
                                Upload to backblaze from client
                            
                                PIL: add a text at the bottom middle of image
                            
                                How to lazy load a picture instead of waiting for it to be finished downloading in Java?
                            
                                Correct sizes for img srcset in a container element?
                            
                                Laravel : How to get random image from directory?
                            
                                "Imagick::flattenImages method is deprecated and it's use should be avoided"
                            
                                WordPress removing original image after resize?
                            
                                Path to Images not Working in Angular 2
                            
                                Despeckle - Remove spots or dots from the image

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Opening Image file from url with PIL for text recognition with pytesseract

Tags:

python-3.x

image

request

python-imaging-library

ocr

sprksh

People also ask

1 Answers

Claudio

Recent Activity

Donate For Us