Distribute many independent, expensive operations over multiple cores in python

Question

Given a large list (1,000+) of completely independent objects that each need to be manipulated through some expensive function (~5 minutes each), what is the best way to distribute the work over other cores? Theoretically, I could just cut up the list into equal parts and serialize the data with cPickle (takes a few seconds) and launch a new python processes for each chunk--and it may just come to that if I intend to use multiple computers--but this feels like more of a hack than anything. Surely there is a more integrated way to do this using a multiprocessing library? Am I over-thinking this?

Thanks.

Danica · Accepted Answer

This sounds like a good use case for a multiprocessing.Pool; depending on exactly what you're doing, it could be as simple as

pool = multiprocessing.Pool(num_procs)
results = pool.map(the_function, list_of_objects)
pool.close()

This will pickle each object in the list independently. If that's a problem, there are various ways to get around that (though all with their own problems and I don't know if any of them work on Windows). Since your computation times are fairly long that's probably irrelevant.

Since you're running this for 5 minutes x 1000 items = several days / number of cores, you probably want to do some saving of partial results along the way and print out some progress information. The easiest thing to do is probably to have your function you call save its results to a file or database or whatever; if that's not practical, you could also use apply_async in a loop and handle the results as they come in.

You could also look into something like joblib to handle this for you; I'm not very familiar with it but it seems like it's approaching the same problem.

Distribute many independent, expensive operations over multiple cores in python

Tags:

python

multiprocessing

SkyNT

1 Answers

Danica

Recent Activity

Donate For Us

Distribute many independent, expensive operations over multiple cores in python

Tags:

python

multiprocessing

SkyNT

1 Answers

Danica

Related questions

Recent Activity

Donate For Us