Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Naive and easiest way to decompose independent loop into parallel threads/processes

  1. I have a loop of intensive calculations, I want them to be accelerated using the multicore processor as they are independent: all performed in parallel. What the easiest way to do that in python?
  2. Let’s imagine that those calculations have to be summed at the end. How to easily add them to a list or a float variable?

Thanks for all your pedagogic answers and using python libraries ;o)

like image 510
sol Avatar asked Jul 11 '11 14:07

sol


2 Answers

From my experience, multi-threading is probably not going to be a viable option for speeding things up (due to the Global Interpreter Lock).

A good alternative is the multiprocessing module. This may or may not work well, depending on how much data you end up having to pass around from one process to another.

Another good alternative would be to consider using numpy for your computations (if you aren't already). If you can vectorize your code, you should be able to achieve significant speedups even on a single core. Depending on what exactly you're doing and on your build of numpy, it might even be able to transparently distribute the computations across multiple cores.

edit Here is a complete example of using the multiprocessing module to perform a simple computation. It uses four processes to compute the squares of the numbers from zero to nine.

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    inputs = range(10)
    result = pool.map(f, inputs)
    print result

This is meant as a simple illustration. Given the trivial nature of f(), this parallel version will almost certainly be slower than computing the same thing serially.

like image 84
NPE Avatar answered Oct 19 '22 14:10

NPE


Multicore processing is a bit difficult to do in CPython (thanks to the GIL ). However, their is the multiprocessing module which allows to use subprocesses (not threads) to split you work on multiple cores.

The module is relatively straight forward to use as long as your code can really be split into multiple parts and doesn't depend on shared objects. The linked documentation should be a good starting point.

like image 44
Martin Thurau Avatar answered Oct 19 '22 12:10

Martin Thurau