How to manage python threads results?

Tags:

I am using this code:

def startThreads(arrayofkeywords):
    global i
    i = 0
    while len(arrayofkeywords):
        try:
            if i<maxThreads:
                keyword = arrayofkeywords.pop(0)
                i = i+1
                thread = doStuffWith(keyword)
                thread.start()
        except KeyboardInterrupt:
            sys.exit()
    thread.join()

for threading in python, I have almost everything done, but I dont know how to manage the results of each thread, on each thread I have an array of strings as result, how can I join all those arrays into one safely? Because, I if I try writing into a global array, two threads could be writing at the same time.

660

asked Jul 13 '10 17:07

jahmax

2 Answers

First, you actually need to save all those thread objects to call join() on them. As written, you're saving only the last one of them, and then only if there isn't an exception.

An easy way to do multithreaded programming is to give each thread all the data it needs to run, and then have it not write to anything outside that working set. If all threads follow that guideline, their writes will not interfere with each other. Then, once a thread has finished, have the main thread only aggregate the results into a global array. This is know as "fork/join parallelism."

If you subclass the Thread object, you can give it space to store that return value without interfering with other threads. Then you can do something like this:

Click to copy

class MyThread(threading.Thread):
    def __init__(self, ...):
        self.result = []
        ...

def main():
    # doStuffWith() returns a MyThread instance
    threads = [ doStuffWith(k).start() for k in arrayofkeywords[:maxThreads] ]
    for t in threads:
        t.join()
        ret = t.result
        # process return value here

Edit:

After looking around a bit, it seems like the above method isn't the preferred way to do threads in Python. The above is more of a Java-esque pattern for threads. Instead you could do something like:

Click to copy

def handler(outList)
    ...
    # Modify existing object (important!)
    outList.append(1)
    ...

def doStuffWith(keyword):
    ...
    result = []
    thread = Thread(target=handler, args=(result,))
    return (thread, result)

def main():
    threads = [ doStuffWith(k) for k in arrayofkeywords[:maxThreads] ]
    for t in threads:
        t[0].start()
    for t in threads:
        t[0].join()
        ret = t[1]
        # process return value here

answered Oct 15 '22 07:10

Karmastan

Use a Queue.Queue instance, which is intrinsically thread-safe. Each thread can .put its results to that global instance when it's done, and the main thread (when it knows all working threads are done, by .joining them for example as in @unholysampler's answer) can loop .getting each result from it, and use each result to .extend the "overall result" list, until the queue is emptied.

Edit: there are other big problems with your code -- if the maximum number of threads is less than the number of keywords, it will never terminate (you're trying to start a thread per keyword -- never less -- but if you've already started the max numbers you loop forever to no further purpose).

Consider instead using a threading pool, kind of like the one in this recipe, except that in lieu of queueing callables you'll queue the keywords -- since the callable you want to run in the thread is the same in each thread, just varying the argument. Of course that callable will be changed to peel something from the incoming-tasks queue (with .get) and .put the list of results to the outgoing-results queue when done.

To terminate the N threads you could, after all keywords, .put N "sentinels" (e.g. None, assuming no keyword can be None): a thread's callable will exit if the "keyword" it just pulled is None.

More often than not, Queue.Queue offers the best way to organize threading (and multiprocessing!) architectures in Python, be they generic like in the recipe I pointed you to, or more specialized like I'm suggesting for your use case in the last two paragraphs.

answered Oct 15 '22 06:10

Alex Martelli

Related questions
                            
                                Flask-Admin vs Flask-AppBuilder
                            
                                Progress bar for pandas.DataFrame.to_sql
                            
                                Pandas replace/dictionary slowness
                            
                                How to make a class attribute exclusive to the super class
                            
                                pyodbc the sql contains 0 parameter markers but 1 parameters were supplied' 'hy000'
                            
                                Converting NumPy array to a set takes too long
                            
                                Can you assign variables in a lambda?
                            
                                Jinja docx template, avoiding new line in nested for
                            
                                Understanding Gunicorn and Flask on Docker/Docker-Compose
                            
                                What is the difference between x_train and x_test in Keras?
                            
                                jupyter notebook not rendering in GitHub gist
                            
                                Pytorch: Why is the memory occupied by the `tensor` variable so small?
                            
                                Problem with GMM library from sklear.mixture?
                            
                                Caching in urllib2?
                            
                                How to send multiple input field values with same name?
                            
                                How can I resize the root window in Tkinter?
                            
                                SocketServer.ThreadingTCPServer - Cannot bind to address after program restart
                            
                                How to resize and draw an image using wxpython?
                            
                                How to read a string one letter at a time in python
                            
                                Python: User-Defined Exception That Proves The Rule

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to manage python threads results?

Tags:

python

arrays

multithreading

jahmax

People also ask

2 Answers

Karmastan

Alex Martelli

Recent Activity

Donate For Us