What's the fastest way to read images from urls?

Tags:

I want to make a generator that generates batches of images from urls to train a keras model. I have another generator that feeds me images urls.

What I currently do is download the image to disk and then load the image from the disk.

def loadImage(URL):
    with urllib.request.urlopen(URL) as url:
        with open('temp.jpg', 'wb') as f:
            f.write(url.read())

    img_path = 'temp.jpg'
    img = image.load_img(img_path, target_size=(125, 125))
    os.remove(img_path)
    x = image.img_to_array(img)
    return x

def imageGenerator(batch_size):
    i = 0
    batch = []
    for URL in imageUrlGenerator():
        if i>batch_size:
            yield batch
            batch = []
            i=0
        batch.append(loadImage(URL))
        i+=1

This works but I wonder if there isn't a faster way to load images from the web without having to write and read in/from disk.

530

asked Apr 24 '19 02:04

Cristian Desivo

1 Answers

Assuming you are actually using keras and that this image.load_img is the method you are calling, it would call a function which should ultimately be PIL.Image.open. In the documentation for PIL.image.open, the first argument fp could be a string filename (which is what you are currently passing), or a stream-like object that implements read, seek, and tell. While the object returned by urllib.request.urlopen does provide all three methods, it does not implement seek at all, so it cannot be used directly. However, the entire buffer can be read into an BytesIO object which does implement seek, so it should be usable. Putting this together, your loadImage function may be reduced to something like the following:

from io import BytesIO

def loadImage(URL):
    with urllib.request.urlopen(URL) as url:
        img = image.load_img(BytesIO(url.read()), target_size=(125, 125))

    return image.img_to_array(img)

This keeps the images downloaded fully in memory.

answered Sep 28 '22 15:09

metatoaster

Related questions
                            
                                How to assuredly suppress a DeprecationWarning in Python?
                            
                                Reverse a list in python based on condition
                            
                                Appending the ColumnTransformer() result to the original data within a pipeline?
                            
                                Breakpoints are not hitting in VS Code while debugging Python Flask app
                            
                                Check if values of multiple columns are the same (python)
                            
                                How do Convolutional Layers (CNNs) work in keras?
                            
                                How to interact with a window's GUI with Python?
                            
                                Emojis in Pycharm Windows 7
                            
                                Checking if two 'time ranges' overlap with one another
                            
                                PySpark: filtering with isin returns empty dataframe
                            
                                How to make Altair plots responsive
                            
                                Pandas specifying custom holidays
                            
                                Encounter: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
                            
                                How do I install and run Pyright from the CLI instead of using VS Code?
                            
                                Compare content of two pandas dataframes even if the rows are differently ordered
                            
                                Numpy taking only first character of string
                            
                                Django: How to check if data is correct before saving it to a database on a post request?
                            
                                TypeError: 'str' object is not callable using Selenium through Python
                            
                                How to configure a tor proxy on windows?
                            
                                Is there a way to label multiple 3d surfaces in matplotlib?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the fastest way to read images from urls?

Tags:

python

url

image

keras

Cristian Desivo

People also ask

1 Answers

metatoaster

Recent Activity

Donate For Us