Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallelize list-comprehension calculations in Python?

Both list comprehensions and map-calculations should -- at least in theory -- be relatively easy to parallelize: each calculation inside a list-comprehension could be done independent of the calculation of all the other elements. For example in the expression

[ x*x for x in range(1000) ] 

each x*x-Calculation could (at least in theory) be done in parallel.

My question is: Is there any Python-Module / Python-Implementation / Python Programming-Trick to parallelize a list-comprehension calculation (in order to use all 16 / 32 / ... cores or distribute the calculation over a Computer-Grid or over a Cloud)?

like image 698
phynfo Avatar asked Mar 08 '11 17:03

phynfo


People also ask

Does Python parallelize list comprehension?

No, because list comprehension itself is a sort of a C-optimized macro. If you pull it out and parallelize it, then it's not a list comprehension, it's just a good old fashioned MapReduce.

Can you parallelize in Python?

There are several common ways to parallelize Python code. You can launch several application instances or a script to perform jobs in parallel. This approach is great when you don't need to exchange data between parallel jobs.

How do you parallelize a task in Python?

Multiprocessing in Python enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel.


2 Answers

As Ken said, it can't, but with 2.6's multiprocessing module, it's pretty easy to parallelize computations.

import multiprocessing  try:     cpus = multiprocessing.cpu_count() except NotImplementedError:     cpus = 2   # arbitrary default   def square(n):     return n * n  pool = multiprocessing.Pool(processes=cpus) print(pool.map(square, range(1000))) 

There are also examples in the documentation that show how to do this using Managers, which should allow for distributed computations as well.

like image 199
Mahmoud Abdelkader Avatar answered Sep 23 '22 04:09

Mahmoud Abdelkader


For shared-memory parallelism, I recommend joblib:

from joblib import delayed, Parallel  def square(x): return x*x values = Parallel(n_jobs=NUM_CPUS)(delayed(square)(x) for x in range(1000)) 
like image 38
Fred Foo Avatar answered Sep 25 '22 04:09

Fred Foo