Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asynchronous HTTP calls in Python

I have a need for a callback kind of functionality in Python where I am sending a request to a webservice multiple times, with a change in the parameter each time. I want these requests to happen concurrently instead of sequentially, so I want the function to be called asynchronously.

It looks like asyncore is what I might want to use, but the examples I've seen of how it works all look like overkill, so I'm wondering if there's another path I should be going down. Any suggestions on modules/process? Ideally I'd like to use these in a procedural fashion instead of creating classes but I may not be able to get around that.

like image 327
kasceled Avatar asked Feb 10 '11 21:02

kasceled


2 Answers

Starting in Python 3.2, you can use concurrent.futures for launching parallel tasks.

Check out this ThreadPoolExecutor example:

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

It spawns threads to retrieve HTML and acts on responses as they are received.

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

The above example uses threading. There is also a similar ProcessPoolExecutor that uses a pool of processes, rather than threads:

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
like image 58
Corey Goldberg Avatar answered Oct 05 '22 00:10

Corey Goldberg


Do you know about eventlet? It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.

Here's an example of a super minimal crawler:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)
like image 29
Raj Avatar answered Oct 05 '22 00:10

Raj