I am using this code:
def startThreads(arrayofkeywords):
global i
i = 0
while len(arrayofkeywords):
try:
if i<maxThreads:
keyword = arrayofkeywords.pop(0)
i = i+1
thread = doStuffWith(keyword)
thread.start()
except KeyboardInterrupt:
sys.exit()
thread.join()
for threading in python, I have almost everything done, but I dont know how to manage the results of each thread, on each thread I have an array of strings as result, how can I join all those arrays into one safely? Because, I if I try writing into a global array, two threads could be writing at the same time.
A simple way to do this is to associate I/O-bound tasks with multithreading (e.g., tasks like disk read/writes, API calls and networking that are limited by the I/O subsystem), and associate CPU-bound tasks with multiprocessing (e.g., tasks like image processing or data analytics that are limited by the CPU's speed).
This can be achieved by creating a new thread instance and using the “target” argument to specify the watchdog function. The watchdog can be given an appropriate name, like “Watchdog” and be configured to be a daemon thread by setting the “daemon” argument to True.
Generally, Python only uses one thread to execute the set of written statements. This means that in python only one thread will be executed at a time.
The Python process will terminate once all (non background threads) are terminated.
First, you actually need to save all those thread
objects to call join()
on them. As written, you're saving only the last one of them, and then only if there isn't an exception.
An easy way to do multithreaded programming is to give each thread all the data it needs to run, and then have it not write to anything outside that working set. If all threads follow that guideline, their writes will not interfere with each other. Then, once a thread has finished, have the main thread only aggregate the results into a global array. This is know as "fork/join parallelism."
If you subclass the Thread object, you can give it space to store that return value without interfering with other threads. Then you can do something like this:
class MyThread(threading.Thread):
def __init__(self, ...):
self.result = []
...
def main():
# doStuffWith() returns a MyThread instance
threads = [ doStuffWith(k).start() for k in arrayofkeywords[:maxThreads] ]
for t in threads:
t.join()
ret = t.result
# process return value here
Edit:
After looking around a bit, it seems like the above method isn't the preferred way to do threads in Python. The above is more of a Java-esque pattern for threads. Instead you could do something like:
def handler(outList)
...
# Modify existing object (important!)
outList.append(1)
...
def doStuffWith(keyword):
...
result = []
thread = Thread(target=handler, args=(result,))
return (thread, result)
def main():
threads = [ doStuffWith(k) for k in arrayofkeywords[:maxThreads] ]
for t in threads:
t[0].start()
for t in threads:
t[0].join()
ret = t[1]
# process return value here
Use a Queue.Queue
instance, which is intrinsically thread-safe. Each thread can .put
its results to that global instance when it's done, and the main thread (when it knows all working threads are done, by .join
ing them for example as in @unholysampler's answer) can loop .get
ting each result from it, and use each result to .extend
the "overall result" list, until the queue is emptied.
Edit: there are other big problems with your code -- if the maximum number of threads is less than the number of keywords, it will never terminate (you're trying to start a thread per keyword -- never less -- but if you've already started the max numbers you loop forever to no further purpose).
Consider instead using a threading pool, kind of like the one in this recipe, except that in lieu of queueing callables you'll queue the keywords -- since the callable you want to run in the thread is the same in each thread, just varying the argument. Of course that callable will be changed to peel something from the incoming-tasks queue (with .get
) and .put
the list of results to the outgoing-results queue when done.
To terminate the N threads you could, after all keywords, .put
N "sentinels" (e.g. None
, assuming no keyword can be None
): a thread's callable will exit if the "keyword" it just pulled is None
.
More often than not, Queue.Queue
offers the best way to organize threading (and multiprocessing!) architectures in Python, be they generic like in the recipe I pointed you to, or more specialized like I'm suggesting for your use case in the last two paragraphs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With