Very simple concurrent programming in Python

Tags:

concurrency

I have a simple Python script that uses two much more complicated Python scripts, and does something with the results.

I have two modules, Foo and Bar, and my code is like the following:

import Foo
import Bar

output = []

a = Foo.get_something()
b = Bar.get_something_else()

output.append(a)
output.append(b)

Both methods take a long time to run, and neither depends on the other, so the obvious solution is to run them in parallel. How can I achieve this, but make sure that the order is maintained: Whichever one finishes first must wait for the other one to finish before the script can continue.

Let me know if I haven't made myself clear enough, I've tried to make the example code as simple as possible.

940

asked May 07 '12 00:05

Ivy

2 Answers

import multiprocessing

import Foo
import Bar

results = {}

def get_a():
    results['a'] = Foo.get_something()



def get_b():
    results['b'] = Bar.get_something_else()

process_a = multiprocessing.Process(target=get_a)
process_b = multiprocessing.Process(target=get_b)


process_b.start()
process_a.start()


process_a.join
process_b.join

Here is the process version of your program.

NOTE: that in threading there are shared datastructures so you have to worry about locking which avoids wrong manipulation of data plus as amber mentioned above it also has a GIL (Global interpreter Lock) problem and since both of your tasks are CPU intensive then this means that it will take more time because of the calls notifying the threads of thread acquisition and release. If however your tasks were I/O intensive then it does not effect that much.

Now since there are no shared datastructures in a process thus no worrying about LOCKS and since it works irrespective of the GIL so you actually enjoy the real power of multiprocessors.

Simple note to remember: process is the same as thread just without using a shared datastructures (everything works in isolation and is focused on messaging.)

check out dabeaz.com he gave a good presentation on concurrent programming once.

134

answered Sep 26 '22 23:09

fazkan

In general, you'd use threading to do this.

First, create a thread for each thing you want to run in parallel:

import threading

import Foo
import Bar

results = {}

def get_a():
    results['a'] = Foo.get_something()
a_thread = threading.Thread(target=get_a)
a_thread.start()

def get_b():
    results['b'] = Bar.get_something_else()
b_thread = threading.Thread(target=get_b)
b_thread.start()

Then to require both of them to have finished, use .join() on both:

a_thread.join()
b_thread.join()

at which point your results will be in results['a'] and results['b'], so if you wanted an ordered list:

output = [results['a'], results['b']]

Note: if both tasks are inherently CPU-intensive, you might want to consider multiprocessing instead - due to Python's GIL, a given Python process will only ever use one CPU core, whereas multiprocessing can distribute the tasks to separate cores. However, it has a slightly higher overhead than threading, and thus if the tasks are less CPU-intensive, it might not be as efficient.

answered Sep 22 '22 23:09

Amber

Related questions
                            
                                Pandas select columns with regex and divide by value
                            
                                Uneven subplot in python
                            
                                Running Flask from IPython raises SystemExit
                            
                                appending data to python dictionary
                            
                                Mean Median Mode lines showing only in last graph in seaborn
                            
                                Django custom authentication back-end doesn't work
                            
                                I am getting an error : rest_framework.request.WrappedAttributeError: 'CSRFCheck' object has no attribute 'process_request'
                            
                                Get part of day (morning, afternoon, evening, night) in Python dataframe
                            
                                What is the difference between .one() and .scalar()
                            
                                Why can't I import candlestick_ohlc from mplfinance
                            
                                Cannot read pickle file in Cloud Run App. TypeError: __cinit__() takes at least 2 positional arguments
                            
                                How do I write a float list of lists to file in Python
                            
                                Django-admin : How to display link to object info page instead of edit form , in records change list?
                            
                                Python: how can I handle any unhandled exception in an alternative way?
                            
                                List filtering and transformation
                            
                                how to find whether a string is contained in another string
                            
                                How can I maximize a specific window with Python?
                            
                                Download files from a list if not already downloaded
                            
                                Condensed matrix function to find pairs
                            
                                python head, tail and backward read by lines of a text file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With