Python utilizing multiple processors

Question

Lets say I have a big list of music of varying length that needs to be converted or images of varying sizes that need to be resized or something like that. The order doesn't matter so it is perfect for splitting across multiple processors.

If I use multiprocessing.Pool's map function it seems like all the work is divided up ahead of time and doesn't take into account the fact that some files may take longer to do that others.

What happens is that if I have 12 processors... near the end of processing, 1 or 2 processors will have 2 or 3 files left to process while other processors that could be utilized sit idle.

Is there some sort of queue implementation that can keep all processors loaded until there is no more work left to do?

Sven Marnach · Accepted Answer

There is a Queue class within the multiprocessing module specifically for this purpose.

Edit: If you are looking for a complete framework for parallel computing which features a map() function using a task queue, have a look at the parallel computing facilities of IPython. In particlar, you can use the TaskClient.map() function to get a load-balanced mapping to the available processors.

luispedro · Answer

This is trivial to do with jug:

def process_image(img):
     ....
images = glob('*.jpg')
for im in images:
      Task(process_image, im)

Now, just run jug execute a few times to spawn worker processes.

Python utilizing multiple processors

Tags:

python

multithreading

batch-file

multiprocessing

queue

eric.frederich

2 Answers

Sven Marnach

luispedro

Recent Activity

Donate For Us

Python utilizing multiple processors

Tags:

python

multithreading

batch-file

multiprocessing

queue

eric.frederich

2 Answers

Sven Marnach

luispedro

Related questions

Recent Activity

Donate For Us