Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Terminate multiple threads when any thread completes a task

I am new to both python, and to threads. I have written python code which acts as a web crawler and searches sites for a specific keyword. My question is, how can I use threads to run three different instances of my class at the same time. When one of the instances finds the keyword, all three must close and stop crawling the web. Here is some code.

class Crawler:       def __init__(self):             # the actual code for finding the keyword    def main():           Crawl = Crawler()   if __name__ == "__main__":         main() 

How can I use threads to have Crawler do three different crawls at the same time?

like image 926
user446836 Avatar asked Jun 08 '11 22:06

user446836


People also ask

Does terminating a process terminate threads?

A thread automatically terminates when it returns from its entry-point routine. A thread can also explicitly terminate itself or terminate any other thread in the process, using a mechanism called cancelation.

What is the method used for terminate a state in thread?

Abort(Object) This method raises a ThreadAbortException in the thread on which it is invoked, to begin the process of terminating the thread while also providing exception information about the thread termination. Generally, this method is used to terminate the thread.

What happens when a thread terminates?

When the main thread returns (i.e., you return from the main function), it terminates the entire process. This includes all other threads. The same thing happens when you call exit . You can avoid this by calling pthread_exit .


1 Answers

There doesn't seem to be a (simple) way to terminate a thread in Python.

Here is a simple example of running multiple HTTP requests in parallel:

import threading  def crawl():     import urllib2     data = urllib2.urlopen("http://www.google.com/").read()      print "Read google.com"  threads = []  for n in range(10):     thread = threading.Thread(target=crawl)     thread.start()      threads.append(thread)  # to wait until all three functions are finished  print "Waiting..."  for thread in threads:     thread.join()  print "Complete." 

With additional overhead, you can use a multi-process aproach that's more powerful and allows you to terminate thread-like processes.

I've extended the example to use that. I hope this will be helpful to you:

import multiprocessing  def crawl(result_queue):     import urllib2     data = urllib2.urlopen("http://news.ycombinator.com/").read()      print "Requested..."      if "result found (for example)":         result_queue.put("result!")      print "Read site."  processs = [] result_queue = multiprocessing.Queue()  for n in range(4): # start 4 processes crawling for the result     process = multiprocessing.Process(target=crawl, args=[result_queue])     process.start()     processs.append(process)  print "Waiting for result..."  result = result_queue.get() # waits until any of the proccess have `.put()` a result  for process in processs: # then kill them all off     process.terminate()  print "Got result:", result 
like image 109
Jeremy Avatar answered Sep 25 '22 17:09

Jeremy