Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making multiple calls with asyncio and adding result to a dictionary

I am having trouble wrapping my ahead around Python 3's Asyncio library. I have a list of zipcodes and I am trying to make async calls to an API to get each zipcodes corresponding city and state. I can do it successfully in sequence with a for loop but I want to make it faster in the case of a big zipcode list.

This is an example of my original that works

import urllib.request, json

zips = ['90210', '60647']

def get_cities(zipcodes):
    zip_cities = dict()
    for idx, zipcode in enumerate(zipcodes):
        url = 'http://maps.googleapis.com/maps/api/geocode/json?address='+zipcode+'&sensor=true'
        response = urllib.request.urlopen(url)
        string = response.read().decode('utf-8')
        data = json.loads(string)
        city = data['results'][0]['address_components'][1]['long_name']
        state = data['results'][0]['address_components'][3]['long_name']
        zip_cities.update({idx: [zipcode, city, state]})
    return zip_cities

results = get_cities(zips)
print(results)
# returns {0: ['90210', 'Beverly Hills', 'California'],
#          1: ['60647', 'Chicago', 'Illinois']}

This is my terrible non-functional attempt at trying to make it async

import asyncio
import urllib.request, json

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcodes):
    url = 'http://maps.googleapis.com/maps/api/geocode/json?address='+zipcode+'&sensor=true'
    response = urllib.request.urlopen(url)
    string = response.read().decode('utf-8')
    data = json.loads(string)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

loop = asyncio.get_event_loop()
loop.run_until_complete([get_cities(zip) for zip in zips])
loop.close()
print(zip_cities) # doesnt work

Any help is much appreciated. All of the tutorials I've come across online seem to be a tad over my head.

Note: I've seen some examples use aiohttp. I was hoping to stick with the native Python 3 libraries if possible.

like image 231
anthony-dandrea Avatar asked Aug 22 '15 22:08

anthony-dandrea


People also ask

Does Asyncio use multiple threads?

When we utilize asyncio we create objects called coroutines. A coroutine can be thought of as executing a lightweight thread. Much like we can have multiple threads running at the same time, each with their own concurrent I/O operation, we can have many coroutines running alongside one another.

What is Aiofiles?

aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications. Ordinary local file IO is blocking, and cannot easily and portably made asynchronous. This means doing file IO may interfere with asyncio applications, which shouldn't block the executing thread.

How do you add a loop to a dictionary?

You can add keys and to dict in a loop in python. Add an item to a dictionary by inserting a new index key into the dictionary, then assigning it a particular value.

Is Asyncio a concurrency?

Asyncio stands for asynchronous input output and refers to a programming paradigm which achieves high concurrency using a single thread or event loop.


1 Answers

You're not going to be able to get any concurrency if you use urllib to do the HTTP request, because it's a synchronous library. Wrapping the function that calls into urllib in a coroutine doesn't change that. You have to use an asynchronous HTTP client that's integrated into asyncio, like aiohttp:

import asyncio
import json
import aiohttp

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcode,idx):
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key=abcdfg&address='+zipcode+'&sensor=true'
    response = yield from aiohttp.request('get', url)
    string = (yield from response.read()).decode('utf-8')
    data = json.loads(string)
    print(data)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

if __name__ == "__main__":        
    loop = asyncio.get_event_loop()
    tasks = [asyncio.async(get_cities(z, i)) for i, z in enumerate(zips)]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
    print(zip_cities)

I know you prefer to only use the stdlib, but the asyncio library doesn't include an HTTP client, so you'd have to basically re-implement pieces of aiohttp to recreate the functionality its providing. I suppose another option would be to make the urllib calls in a background thread, so that they don't block the event loop, but its kind of silly to do when aiohttp is available (and sort of defeats the purpose of using asyncio in the first place):

import asyncio
import json
import urllib.request
from concurrent.futures import ThreadPoolExecutor

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcode,idx):
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key=abcdfg&address='+zipcode+'&sensor=true'
    response = yield from loop.run_in_executor(executor, urllib.request.urlopen, url)
    string = response.read().decode('utf-8')
    data = json.loads(string)
    print(data)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

if __name__ == "__main__":
    executor = ThreadPoolExecutor(10)
    loop = asyncio.get_event_loop()
    tasks = [asyncio.async(get_cities(z, i)) for i, z in enumerate(zips)]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
    print(zip_cities)
like image 165
dano Avatar answered Oct 25 '22 18:10

dano