A very simple multithreading parallel URL fetching (without queue)

2 Answers

Simplifying your original version as far as possible:

import threading import urllib2 import time  start = time.time() urls = ["http://www.google.com", "http://www.apple.com", "http://www.microsoft.com", "http://www.amazon.com", "http://www.facebook.com"]  def fetch_url(url):     urlHandler = urllib2.urlopen(url)     html = urlHandler.read()     print "'%s\' fetched in %ss" % (url, (time.time() - start))  threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls] for thread in threads:     thread.start() for thread in threads:     thread.join()  print "Elapsed Time: %s" % (time.time() - start)

The only new tricks here are:

Keep track of the threads you create.
Don't bother with a counter of threads if you just want to know when they're all done; join already tells you that.
If you don't need any state or external API, you don't need a Thread subclass, just a target function.

101

answered Oct 14 '22 20:10

abarnert

multiprocessing has a thread pool that doesn't start other processes:

#!/usr/bin/env python from multiprocessing.pool import ThreadPool from time import time as timer from urllib2 import urlopen  urls = ["http://www.google.com", "http://www.apple.com", "http://www.microsoft.com", "http://www.amazon.com", "http://www.facebook.com"]  def fetch_url(url):     try:         response = urlopen(url)         return url, response.read(), None     except Exception as e:         return url, None, e  start = timer() results = ThreadPool(20).imap_unordered(fetch_url, urls) for url, html, error in results:     if error is None:         print("%r fetched in %ss" % (url, timer() - start))     else:         print("error fetching %r: %s" % (url, error)) print("Elapsed Time: %s" % (timer() - start,))

The advantages compared to Thread-based solution:

ThreadPool allows to limit the maximum number of concurrent connections (20 in the code example)
the output is not garbled because all output is in the main thread
errors are logged
the code works on both Python 2 and 3 without changes (assuming from urllib.request import urlopen on Python 3).

answered Oct 14 '22 18:10

jfs

Related questions
                            
                                virtualenv, mysql-python, pip: anyone know how? [duplicate]
                            
                                ImportError: No module named - Python
                            
                                unbuffered stdout in python (as in python -u) from within the program [duplicate]
                            
                                Specifying data type in Pandas csv reader
                            
                                Pandas update sql
                            
                                Different result with roc_auc_score() and auc()
                            
                                How to download a file using python in a 'smarter' way?
                            
                                Are numpy arrays passed by reference?
                            
                                memoization library for python 2.7
                            
                                Matplotlib: display plot on a remote machine
                            
                                How to specify install order for python pip?
                            
                                Unittest's assertEqual and iterables - only check the contents
                            
                                Reading YAML file with Python results in yaml.composer.ComposerError: expected a single document in the stream
                            
                                What should I use to open a url instead of urlopen in urllib3
                            
                                How to tell if a connection is dead in python
                            
                                How to start a python file while Windows starts?
                            
                                Pandas column values to columns?
                            
                                How to round a numpy array?
                            
                                In production, Apache + mod_wsgi or Nginx + mod_wsgi?
                            
                                Python overwriting variables in nested functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

A very simple multithreading parallel URL fetching (without queue)

Tags:

python

multithreading

callback

python-multithreading

urlfetch

Daniele B

People also ask

2 Answers

abarnert

jfs

Recent Activity

Donate For Us