This is probably a trivial question, but how do I parallelize the following loop in python?
# setup output lists output1 = list() output2 = list() output3 = list() for j in range(0, 10): # calc individual parameter value parameter = j * offset # call the calculation out1, out2, out3 = calc_stuff(parameter = parameter) # put results into correct output list output1.append(out1) output2.append(out2) output3.append(out3)
I know how to start single threads in Python but I don't know how to "collect" the results.
Multiple processes would be fine too - whatever is easiest for this case. I'm using currently Linux but the code should run on Windows and Mac as-well.
What's the easiest way to parallelize this code?
Use the joblib Module to Parallelize the for Loop in Python The joblib module uses multiprocessing to run the multiple CPU cores to perform the parallelizing of for loop. It provides a lightweight pipeline that memorizes the pattern for easy and straightforward parallel computation.
There are several common ways to parallelize Python code. You can launch several application instances or a script to perform jobs in parallel. This approach is great when you don't need to exchange data between parallel jobs.
Multiprocessing in Python enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel.
The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors. To do this, you initialize a Pool with n number of processors and pass the function you want to parallelize to one of Pool s parallization methods.
Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing
module instead:
pool = multiprocessing.Pool(4) out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won't work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
from joblib import Parallel, delayed def process(i): return i * i results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10)) print(results) # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib
).
Taken from https://blog.dominodatalab.com/simple-parallelization/
joblib
, multiprocessing
, threading
and asyncio
joblib
in the above code uses import multiprocessing
under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)joblib
use multiple threads instead of multiple processes, but this (or using import threading
directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another threadthreading
, you can parallelise work with asyncio, but the same advice applies like for import threading
(though in contrast to latter, only 1 thread will be used; on the plus side, asyncio
has a lot of nice features which are helpful for async programming)joblib
produces better results:import time from joblib import Parallel, delayed def countdown(n): while n>0: n -= 1 return n t = time.time() for _ in range(20): print(countdown(10**7), end=" ") print(time.time() - t) # takes ~10.5 seconds on medium sized Macbook Pro t = time.time() results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20)) print(results) print(time.time() - t) # takes ~6.3 seconds on medium sized Macbook Pro
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With