Scenario: I need to gather paginated data from a web app's API which has a call limit of 100 per minute. The API object I need to return contains 100 items per page for 105 total, and growing, pages (~10,500 total items). Synchronous code was taking approximately 15 minutes to retrieve all the pages, so there was no worry about hitting the call limits then. However, I wanted to speed up the data retrieval, so I implemented asynchronous calls using asyncio
and aiohttp
. Data now downloads in 15 seconds - nice.
Problem: I'm now hitting the call limit thus receiving 403 errors for the last 5 or so calls.
Proposed Solution I implemented the try/except
found in the get_data()
function. I make the calls, and then when the call is not successful because of 403: Exceeded call limit
I back off for back_off
seconds and retry up to retries
times:
async def get_data(session, url):
retries = 3
back_off = 60 # seconds to try again
for _ in range(retries):
try:
async with session.get(url, headers=headers) as response:
if response.status != 200:
response.raise_for_status()
print(retries, response.status, url)
return await response.json()
except aiohttp.client_exceptions.ClientResponseError as e:
retries -= 1
await asyncio.sleep(back_off)
continue
async def main():
async with aiohttp.ClientSession() as session:
attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
attendee_data = await asyncio.gather(*[get_data(session, attendee_url) for attendee_url in attendee_urls])
return attendee_data
if __name__ == '__main__':
data = asyncio.run(main())
Question: How do I limit the aiohttp calls so that they stay under the 100 calls/minute threshold without making a 403 request to back off? I've tried the following modules and none of them appeared to do anything: ratelimiter
, ratelimit
and asyncio-throttle
.
Goal: To make 100 async calls per minute, but backing off and retrying if necessary (403: Exceeded call limit).
You can achieve "at most 100 requests/min" by adding a delay before every request.
100 requests/min is equivalent to 1 request/0.6s.
async def main():
async with aiohttp.ClientSession() as session:
attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
coroutines = []
for attendee_url in attendee_urls:
coroutines.append(get_data(session, attendee_url))
await asyncio.sleep(0.6)
attendee_data = asyncio.gather(*coroutines)
return attendee_data
Apart from the request rate limit, often, APIs also limit the no. of simultaneous requests. If so, you can use BoundedSempahore.
async def main():
sema = asyncio.BoundedSemaphore(50) # Assuming a concurrent requests limit of 50
...
coroutines.append(get_data(sema, session, attendee_url))
...
def get_data(sema, session, attendee_url):
...
for _ in range(retries):
try:
async with sema:
response = await session.get(url, headers=headers):
if response.status != 200:
response.raise_for_status()
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With