Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asynchronous requests backoff/throttling best practice

Scenario: I need to gather paginated data from a web app's API which has a call limit of 100 per minute. The API object I need to return contains 100 items per page for 105 total, and growing, pages (~10,500 total items). Synchronous code was taking approximately 15 minutes to retrieve all the pages, so there was no worry about hitting the call limits then. However, I wanted to speed up the data retrieval, so I implemented asynchronous calls using asyncio and aiohttp. Data now downloads in 15 seconds - nice.

Problem: I'm now hitting the call limit thus receiving 403 errors for the last 5 or so calls.

Proposed Solution I implemented the try/except found in the get_data() function. I make the calls, and then when the call is not successful because of 403: Exceeded call limit I back off for back_off seconds and retry up to retries times:

async def get_data(session, url):
    retries = 3
    back_off = 60  # seconds to try again
    for _ in range(retries):
        try:
            async with session.get(url, headers=headers) as response:
                if response.status != 200:
                    response.raise_for_status()
                print(retries, response.status, url)
                return await response.json()
        except aiohttp.client_exceptions.ClientResponseError as e:
            retries -= 1
            await asyncio.sleep(back_off)
            continue

async def main():
    async with aiohttp.ClientSession() as session:
        attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
        attendee_data = await asyncio.gather(*[get_data(session, attendee_url) for attendee_url in attendee_urls])
        return attendee_data

if __name__ == '__main__':
    data = asyncio.run(main())

Question: How do I limit the aiohttp calls so that they stay under the 100 calls/minute threshold without making a 403 request to back off? I've tried the following modules and none of them appeared to do anything: ratelimiter, ratelimit and asyncio-throttle.

Goal: To make 100 async calls per minute, but backing off and retrying if necessary (403: Exceeded call limit).

like image 299
gbeaven Avatar asked Nov 06 '22 09:11

gbeaven


1 Answers

You can achieve "at most 100 requests/min" by adding a delay before every request.

100 requests/min is equivalent to 1 request/0.6s.

async def main():

    async with aiohttp.ClientSession() as session:
        attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
        coroutines = []
        for attendee_url in attendee_urls:
            coroutines.append(get_data(session, attendee_url))
            await asyncio.sleep(0.6)
        attendee_data = asyncio.gather(*coroutines)
        return attendee_data

Apart from the request rate limit, often, APIs also limit the no. of simultaneous requests. If so, you can use BoundedSempahore.

async def main():
    sema = asyncio.BoundedSemaphore(50) # Assuming a concurrent requests limit of 50
...
            coroutines.append(get_data(sema, session, attendee_url))
...

def get_data(sema, session, attendee_url):

...

    for _ in range(retries):
        try:
            async with sema:
                response = await session.get(url, headers=headers):
                if response.status != 200:
                    response.raise_for_status()
...
like image 105
Shiva Avatar answered Nov 11 '22 14:11

Shiva