Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to manage python threads results?

I am using this code:

def startThreads(arrayofkeywords):
    global i
    i = 0
    while len(arrayofkeywords):
        try:
            if i<maxThreads:
                keyword = arrayofkeywords.pop(0)
                i = i+1
                thread = doStuffWith(keyword)
                thread.start()
        except KeyboardInterrupt:
            sys.exit()
    thread.join()

for threading in python, I have almost everything done, but I dont know how to manage the results of each thread, on each thread I have an array of strings as result, how can I join all those arrays into one safely? Because, I if I try writing into a global array, two threads could be writing at the same time.

like image 660
jahmax Avatar asked Jul 13 '10 17:07

jahmax


People also ask

How do you manage multiple threads in Python?

A simple way to do this is to associate I/O-bound tasks with multithreading (e.g., tasks like disk read/writes, API calls and networking that are limited by the I/O subsystem), and associate CPU-bound tasks with multiprocessing (e.g., tasks like image processing or data analytics that are limited by the CPU's speed).

How do I monitor a thread in Python?

This can be achieved by creating a new thread instance and using the “target” argument to specify the watchdog function. The watchdog can be given an appropriate name, like “Watchdog” and be configured to be a daemon thread by setting the “daemon” argument to True.

How many threads is too much Python?

Generally, Python only uses one thread to execute the set of written statements. This means that in python only one thread will be executed at a time.

What happens to a thread when it finishes Python?

The Python process will terminate once all (non background threads) are terminated.


2 Answers

First, you actually need to save all those thread objects to call join() on them. As written, you're saving only the last one of them, and then only if there isn't an exception.

An easy way to do multithreaded programming is to give each thread all the data it needs to run, and then have it not write to anything outside that working set. If all threads follow that guideline, their writes will not interfere with each other. Then, once a thread has finished, have the main thread only aggregate the results into a global array. This is know as "fork/join parallelism."

If you subclass the Thread object, you can give it space to store that return value without interfering with other threads. Then you can do something like this:

class MyThread(threading.Thread):
    def __init__(self, ...):
        self.result = []
        ...

def main():
    # doStuffWith() returns a MyThread instance
    threads = [ doStuffWith(k).start() for k in arrayofkeywords[:maxThreads] ]
    for t in threads:
        t.join()
        ret = t.result
        # process return value here

Edit:

After looking around a bit, it seems like the above method isn't the preferred way to do threads in Python. The above is more of a Java-esque pattern for threads. Instead you could do something like:

def handler(outList)
    ...
    # Modify existing object (important!)
    outList.append(1)
    ...

def doStuffWith(keyword):
    ...
    result = []
    thread = Thread(target=handler, args=(result,))
    return (thread, result)

def main():
    threads = [ doStuffWith(k) for k in arrayofkeywords[:maxThreads] ]
    for t in threads:
        t[0].start()
    for t in threads:
        t[0].join()
        ret = t[1]
        # process return value here
like image 82
Karmastan Avatar answered Oct 15 '22 07:10

Karmastan


Use a Queue.Queue instance, which is intrinsically thread-safe. Each thread can .put its results to that global instance when it's done, and the main thread (when it knows all working threads are done, by .joining them for example as in @unholysampler's answer) can loop .getting each result from it, and use each result to .extend the "overall result" list, until the queue is emptied.

Edit: there are other big problems with your code -- if the maximum number of threads is less than the number of keywords, it will never terminate (you're trying to start a thread per keyword -- never less -- but if you've already started the max numbers you loop forever to no further purpose).

Consider instead using a threading pool, kind of like the one in this recipe, except that in lieu of queueing callables you'll queue the keywords -- since the callable you want to run in the thread is the same in each thread, just varying the argument. Of course that callable will be changed to peel something from the incoming-tasks queue (with .get) and .put the list of results to the outgoing-results queue when done.

To terminate the N threads you could, after all keywords, .put N "sentinels" (e.g. None, assuming no keyword can be None): a thread's callable will exit if the "keyword" it just pulled is None.

More often than not, Queue.Queue offers the best way to organize threading (and multiprocessing!) architectures in Python, be they generic like in the recipe I pointed you to, or more specialized like I'm suggesting for your use case in the last two paragraphs.

like image 45
Alex Martelli Avatar answered Oct 15 '22 06:10

Alex Martelli