Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Threading in python using queue

I wanted to use threading in python to download lot of webpages and went through the following code which uses queues in one of the website.

it puts a infinite while loop. Does each of thread run continuously with out ending till all of them are complete? Am I missing something.

#!/usr/bin/env python import Queue import threading import urllib2 import time  hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", "http://ibm.com", "http://apple.com"]  queue = Queue.Queue()  class ThreadUrl(threading.Thread):   """Threaded Url Grab"""   def __init__(self, queue):     threading.Thread.__init__(self)     self.queue = queue    def run(self):     while True:       #grabs host from queue       host = self.queue.get()        #grabs urls of hosts and prints first 1024 bytes of page       url = urllib2.urlopen(host)       print url.read(1024)        #signals to queue job is done       self.queue.task_done()  start = time.time() def main():    #spawn a pool of threads, and pass them queue instance    for i in range(5):     t = ThreadUrl(queue)     t.setDaemon(True)     t.start()    #populate queue with data      for host in hosts:     queue.put(host)    #wait on the queue until everything has been processed        queue.join()  main() print "Elapsed Time: %s" % (time.time() - start) 
like image 504
raju Avatar asked Nov 20 '12 20:11

raju


People also ask

How do you queue a thread in Python?

You can make a queue or line of tasks or objects by using the queue library in Python. Simply you can add a task to the queue (using put() method) or get a task out of the line and processes it (using get() method). Threading package in Python let you run multiple tasks at the same time.

Are queues in Python thread-safe?

Yes, Queue is thread-safe.

What does queue () do in Python?

Queue is built-in module of Python which is used to implement a queue. queue. Queue(maxsize) initializes a variable to a maximum size of maxsize. A maxsize of zero '0' means a infinite queue.

What is the threading function in Python?

Threading in python is used to run multiple threads (tasks, function calls) at the same time. Note that this does not mean that they are executed on different CPUs. Python threads will NOT make your program faster if it already uses 100 % CPU time.


2 Answers

Setting the thread's to be daemon threads causes them to exit when the main is done. But, yes you are correct in that your threads will run continuously for as long as there is something in the queue else it will block.

The documentation explains this detail Queue docs

The python Threading documentation explains the daemon part as well.

The entire Python program exits when no alive non-daemon threads are left.

So, when the queue is emptied and the queue.join resumes when the interpreter exits the threads will then die.

EDIT: Correction on default behavior for Queue

like image 192
sean Avatar answered Sep 22 '22 01:09

sean


Your script works fine for me, so I assume you are asking what is going on so you can understand it better. Yes, your subclass puts each thread in an infinite loop, waiting on something to be put in the queue. When something is found, it grabs it and does its thing. Then, the critical part, it notifies the queue that it's done with queue.task_done, and resumes waiting for another item in the queue.

While all this is going on with the worker threads, the main thread is waiting (join) until all the tasks in the queue are done, which will be when the threads have sent the queue.task_done flag the same number of times as messages in the queue . At that point the main thread finishes and exits. Since these are deamon threads, they close down too.

This is cool stuff, threads and queues. It's one of the really good parts of Python. You will hear all kinds of stuff about how threading in Python is screwed up with the GIL and such. But if you know where to use them (like in this case with network I/O), they will really speed things up for you. The general rule is if you are I/O bound, try and test threads; if you are cpu bound, threads are probably not a good idea, maybe try processes instead.

good luck,

Mike

like image 27
MikeHunter Avatar answered Sep 21 '22 01:09

MikeHunter