How do I parallelize a simple Python loop?

Tags:

parallel-processing

This is probably a trivial question, but how do I parallelize the following loop in python?

# setup output lists output1 = list() output2 = list() output3 = list()  for j in range(0, 10):     # calc individual parameter value     parameter = j * offset     # call the calculation     out1, out2, out3 = calc_stuff(parameter = parameter)      # put results into correct output list     output1.append(out1)     output2.append(out2)     output3.append(out3)

I know how to start single threads in Python but I don't know how to "collect" the results.

Multiple processes would be fine too - whatever is easiest for this case. I'm using currently Linux but the code should run on Windows and Mac as-well.

What's the easiest way to parallelize this code?

324

asked Mar 20 '12 11:03

2 Answers

Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing module instead:

pool = multiprocessing.Pool(4) out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))

Note that this won't work in the interactive interpreter.

To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.

answered Sep 19 '22 20:09

Sven Marnach

from joblib import Parallel, delayed def process(i):     return i * i      results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10)) print(results)  # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).

Taken from https://blog.dominodatalab.com/simple-parallelization/

Edit on Mar 31, 2021: On `joblib`, `multiprocessing`, `threading` and `asyncio`

joblib in the above code uses import multiprocessing under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)
You can let joblib use multiple threads instead of multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread
Since Python 3.7, as an alternative to threading, you can parallelise work with asyncio, but the same advice applies like for import threading (though in contrast to latter, only 1 thread will be used; on the plus side, asyncio has a lot of nice features which are helpful for async programming)
Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that joblib produces better results:

import time from joblib import Parallel, delayed  def countdown(n):     while n>0:         n -= 1     return n   t = time.time() for _ in range(20):     print(countdown(10**7), end=" ") print(time.time() - t)   # takes ~10.5 seconds on medium sized Macbook Pro   t = time.time() results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20)) print(results) print(time.time() - t) # takes ~6.3 seconds on medium sized Macbook Pro

answered Sep 18 '22 20:09

tyrex

Related questions
                            
                                How to get last items of a list in Python?
                            
                                Can "list_display" in a Django ModelAdmin display attributes of ForeignKey fields?
                            
                                Why is IoC / DI not common in Python?
                            
                                Python String and Integer concatenation [duplicate]
                            
                                Convert datetime object to a String of date only in Python
                            
                                Python 3: ImportError "No Module named Setuptools"
                            
                                Store output of subprocess.Popen call in a string [duplicate]
                            
                                Is a Python list guaranteed to have its elements stay in the order they are inserted in?
                            
                                Adding a legend to PyPlot in Matplotlib in the simplest manner possible
                            
                                Why is "except: pass" a bad programming practice?
                            
                                Why does range(start, end) not include end?
                            
                                Why do many examples use `fig, ax = plt.subplots()` in Matplotlib/pyplot/python
                            
                                'too many values to unpack', iterating over a dict. key=>string, value=>list
                            
                                Secondary axis with twinx(): how to add to legend?
                            
                                What is the most efficient way to loop through dataframes with pandas?
                            
                                How to use a dot "." to access members of dictionary?
                            
                                How to add pandas data to an existing csv file?
                            
                                Converting NumPy array into Python List structure?
                            
                                What is the difference between Numpy's array() and asarray() functions?
                            
                                How to perform OR condition in django queryset?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I parallelize a simple Python loop?

Tags:

python

parallel-processing

memyself

People also ask

2 Answers

Sven Marnach

Edit on Mar 31, 2021: On `joblib`, `multiprocessing`, `threading` and `asyncio`

tyrex

Recent Activity

Donate For Us

How do I parallelize a simple Python loop?

Tags:

python

parallel-processing

memyself

People also ask

2 Answers

Sven Marnach

Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio

tyrex

Related questions

Recent Activity

Donate For Us

Edit on Mar 31, 2021: On `joblib`, `multiprocessing`, `threading` and `asyncio`