Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parallelize a simple Python loop?

This is probably a trivial question, but how do I parallelize the following loop in python?

# setup output lists output1 = list() output2 = list() output3 = list()  for j in range(0, 10):     # calc individual parameter value     parameter = j * offset     # call the calculation     out1, out2, out3 = calc_stuff(parameter = parameter)      # put results into correct output list     output1.append(out1)     output2.append(out2)     output3.append(out3) 

I know how to start single threads in Python but I don't know how to "collect" the results.

Multiple processes would be fine too - whatever is easiest for this case. I'm using currently Linux but the code should run on Windows and Mac as-well.

What's the easiest way to parallelize this code?

like image 324
memyself Avatar asked Mar 20 '12 11:03

memyself


People also ask

Does Python parallelize for loops?

Use the joblib Module to Parallelize the for Loop in Python The joblib module uses multiprocessing to run the multiple CPU cores to perform the parallelizing of for loop. It provides a lightweight pipeline that memorizes the pattern for easy and straightforward parallel computation.

Can you parallelize Python?

There are several common ways to parallelize Python code. You can launch several application instances or a script to perform jobs in parallel. This approach is great when you don't need to exchange data between parallel jobs.

How do you parallelize a task in Python?

Multiprocessing in Python enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel.

How can we parallelize?

The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors. To do this, you initialize a Pool with n number of processors and pass the function you want to parallelize to one of Pool s parallization methods.


2 Answers

Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing module instead:

pool = multiprocessing.Pool(4) out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset))) 

Note that this won't work in the interactive interpreter.

To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.

like image 68
Sven Marnach Avatar answered Sep 19 '22 20:09

Sven Marnach


from joblib import Parallel, delayed def process(i):     return i * i      results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10)) print(results)  # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] 

The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).

Taken from https://blog.dominodatalab.com/simple-parallelization/


Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio

  • joblib in the above code uses import multiprocessing under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)
  • You can let joblib use multiple threads instead of multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread
  • Since Python 3.7, as an alternative to threading, you can parallelise work with asyncio, but the same advice applies like for import threading (though in contrast to latter, only 1 thread will be used; on the plus side, asyncio has a lot of nice features which are helpful for async programming)
  • Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that joblib produces better results:
import time from joblib import Parallel, delayed  def countdown(n):     while n>0:         n -= 1     return n   t = time.time() for _ in range(20):     print(countdown(10**7), end=" ") print(time.time() - t)   # takes ~10.5 seconds on medium sized Macbook Pro   t = time.time() results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20)) print(results) print(time.time() - t) # takes ~6.3 seconds on medium sized Macbook Pro 
like image 28
tyrex Avatar answered Sep 18 '22 20:09

tyrex