Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to read images from urls?

I want to make a generator that generates batches of images from urls to train a keras model. I have another generator that feeds me images urls.

What I currently do is download the image to disk and then load the image from the disk.

def loadImage(URL):
    with urllib.request.urlopen(URL) as url:
        with open('temp.jpg', 'wb') as f:
            f.write(url.read())

    img_path = 'temp.jpg'
    img = image.load_img(img_path, target_size=(125, 125))
    os.remove(img_path)
    x = image.img_to_array(img)
    return x

def imageGenerator(batch_size):
    i = 0
    batch = []
    for URL in imageUrlGenerator():
        if i>batch_size:
            yield batch
            batch = []
            i=0
        batch.append(loadImage(URL))
        i+=1

This works but I wonder if there isn't a faster way to load images from the web without having to write and read in/from disk.

like image 530
Cristian Desivo Avatar asked Apr 24 '19 02:04

Cristian Desivo


People also ask

How do I make an image a URL in Python?

Run the command python -m SimpleHTTPServer (python2) or python -m http. server (python3) to turn a folder to a web application. The folder is where you leave your images. You could retrieve any images under this folder as any URL based resources.


1 Answers

Assuming you are actually using keras and that this image.load_img is the method you are calling, it would call a function which should ultimately be PIL.Image.open. In the documentation for PIL.image.open, the first argument fp could be a string filename (which is what you are currently passing), or a stream-like object that implements read, seek, and tell. While the object returned by urllib.request.urlopen does provide all three methods, it does not implement seek at all, so it cannot be used directly. However, the entire buffer can be read into an BytesIO object which does implement seek, so it should be usable. Putting this together, your loadImage function may be reduced to something like the following:

from io import BytesIO

def loadImage(URL):
    with urllib.request.urlopen(URL) as url:
        img = image.load_img(BytesIO(url.read()), target_size=(125, 125))

    return image.img_to_array(img)

This keeps the images downloaded fully in memory.

like image 53
metatoaster Avatar answered Sep 28 '22 15:09

metatoaster