Asynchronously read and process an image in python

Tags:

Context

I often found myself in the following situation:

I have a list of image filenames I need to process
I read each image sequentially using for instance scipy.misc.imread
Then I do some kind of processing on each image and return a result
I save the result along the image filename into a Shelf

The problem is that simply reading the image takes a non negligible amount of time, sometime comparable or even longer than the image processing.

Question

So I was thinking that ideally I could read image n + 1 while processing image n. Or even better processing and reading multiple images at once in an automagically determined optimal way ?

I have read about multiprocessing, threads, twisted, gevent and the like but I can't figure out which one to use and how to implement this idea. Does anyone have a solution to this kind of issue ?

Minimal example

# generate a list of images
scipy.misc.imsave("lena.png", scipy.misc.lena())
files = ['lena.png'] * 100

# a simple image processing task
def process_image(im, threshold=128):
    label, n = scipy.ndimage.label(im > threshold)
    return n

# my current main loop
for f in files:
    im = scipy.misc.imread(f)
    print process_image(im)

751

asked Sep 18 '12 09:09

Nicolas Barbey

1 Answers

Philip's answer is good, but will only create a couple of processes (one reading, one computing) which will hardly max out a modern >2 core system. Here's an alternative using multiprocessing.Pool (specifically, its map method) which creates processes which do both the reading and compute aspects, but which should make better use of all the cores you have available (assuming there are more files than cores).

#!/usr/bin/env python

import multiprocessing
import scipy
import scipy.misc
import scipy.ndimage

class Processor:
    def __init__(self,threshold):
        self._threshold=threshold

    def __call__(self,filename):
        im = scipy.misc.imread(filename)
        label,n = scipy.ndimage.label(im > self._threshold)
        return n

def main():
    scipy.misc.imsave("lena.png", scipy.misc.lena())
    files = ['lena.png'] * 100

    proc=Processor(128)
    pool=multiprocessing.Pool()
    results=pool.map(proc,files)

    print results

if __name__ == "__main__":
    main()

If I increase the number of images to 500, and use the processes=N argument to Pool, then I get

Processes   Runtime
   1         6.2s
   2         3.2s
   4         1.8s
   8         1.5s

on my quad-core hyperthreaded i7.

If you got into more realistic use-cases (ie actual different images), your processes might be spending more time waiting on the image data to load from storage (in my testing, they load virtually instantaneously from cached disk) and then it might be worth explicitly creating more processes than cores to get some more overlap of compute and load. Only your own scalability testing on a realistic load and HW can tell you what's actually best for you though.

answered Oct 02 '22 17:10

timday

Related questions
                            
                                pyenv: no such command `virtualenv'
                            
                                Resources for TDD aimed at Python Web Development [closed]
                            
                                How does python close files that have been gc'ed?
                            
                                HTTP Authentication in Python
                            
                                How to export C# methods?
                            
                                Importing Python module from Bash
                            
                                error in python d not defined. [duplicate]
                            
                                Python tarfile progress output?
                            
                                How to run a code whenever a Tkinter widget value changes?
                            
                                Freeze in Python?
                            
                                How to convert string timezones in form (Country/city) into datetime.tzinfo
                            
                                Using python how to find elements in a list of lists based on a key that is an element of the inner list?
                            
                                OSError 38 [Errno 38] with multiprocessing
                            
                                Python - Multiple frames with Grid manager
                            
                                Extracting text from XML using python
                            
                                How to get value from selected item in treeview in PyGTK?
                            
                                python - regex search and findall
                            
                                passing a variable into a jinja import or include from a parent html file
                            
                                PySide: Easier way of updating GUI from another thread
                            
                                Set random seed temporarily, like "new Random()"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Asynchronously read and process an image in python

Tags:

python

asynchronous

image-processing

numpy

scipy

Nicolas Barbey

People also ask

1 Answers

timday

Recent Activity

Donate For Us