Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need some assistance with Python threading/queue

import threading
import Queue
import urllib2
import time

class ThreadURL(threading.Thread):

    def __init__(self, queue):
        threading.Thread.__init__(self)

        self.queue = queue

    def run(self):
        while True:
            host = self.queue.get()
            sock = urllib2.urlopen(host)
            data = sock.read()

            self.queue.task_done()

hosts = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.facebook.com', 'http://stackoverflow.com']
start = time.time()

def main():
    queue = Queue.Queue()

    for i in range(len(hosts)):
        t = ThreadURL(queue)
        t.start()

    for host in hosts:
        queue.put(host)

    queue.join()

if __name__ == '__main__':
    main()
    print 'Elapsed time: {0}'.format(time.time() - start)

I've been trying to get my head around how to perform Threading and after a few tutorials, I've come up with the above.

What it's supposed to do is:

  1. Initialiase the queue
  2. Create my Thread pool and then queue up the list of hosts
  3. My ThreadURL class should then begin work once a host is in the queue and read the website data
  4. The program should finish

What I want to know first off is, am I doing this correctly? Is this the best way to handle threads?

Secondly, my program fails to exit. It prints out the Elapsed time line and then hangs there. I have to kill my terminal for it to go away. I'm assuming this is due to my incorrect use of queue.join() ?

like image 858
dave Avatar asked Nov 09 '10 06:11

dave


People also ask

What are the limitations of threading in Python?

In fact, a Python process cannot run threads in parallel but it can run them concurrently through context switching during I/O bound operations. This limitation is actually enforced by GIL. The Python Global Interpreter Lock (GIL) prevents threads within the same process to be executed at the same time.

How do you queue a thread in Python?

You can make a queue or line of tasks or objects by using the queue library in Python. Simply you can add a task to the queue (using put() method) or get a task out of the line and processes it (using get() method).

Which Python libraries support threads?

The Python standard library provides threading , which contains most of the primitives you'll see in this article. Thread , in this module, nicely encapsulates threads, providing a clean interface to work with them. When you create a Thread , you pass it a function and a list containing the arguments to that function.

How do I enable threads in Python?

You need to assign the thread object to a variable and then start it using that varaible: thread1=threading. Thread(target=f) followed by thread1. start() . Then you can do thread1.


2 Answers

Your code looks fine and is quite clean.

The reason your application still "hangs" is that the worker threads are still running, waiting for the main application to put something in the queue, even though your main thread is finished.

The simplest way to fix this is to mark the threads as daemons, by doing t.daemon = True before your call to start. This way, the threads will not block the program stopping.

like image 155
Yann Ramin Avatar answered Sep 29 '22 13:09

Yann Ramin


looks fine. yann is right about the daemon suggestion. that will fix your hang. my only question is why use the queue at all? you're not doing any cross thread communication, so it seems like you could just send the host info as an arg to ThreadURL init() and drop the queue.

nothing wrong with it, just wondering.

like image 23
mix Avatar answered Sep 29 '22 11:09

mix