Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to limit concurrency with Python asyncio?

Let's assume we have a bunch of links to download and each of the link may take a different amount of time to download. And I'm allowed to download using utmost 3 connections only. Now, I want to ensure that I do this efficiently using asyncio.

Here's what I'm trying to achieve: At any point in time, try to ensure that I have atleast 3 downloads running.

Connection 1: 1---------7---9--- Connection 2: 2---4----6----- Connection 3: 3-----5---8----- 

The numbers represent the download links, while hyphens represent Waiting for download.

Here is the code that I'm using right now

from random import randint import asyncio  count = 0   async def download(code, permit_download, no_concurrent, downloading_event):     global count     downloading_event.set()     wait_time = randint(1, 3)     print('downloading {} will take {} second(s)'.format(code, wait_time))     await asyncio.sleep(wait_time)  # I/O, context will switch to main function     print('downloaded {}'.format(code))     count -= 1     if count < no_concurrent and not permit_download.is_set():         permit_download.set()   async def main(loop):     global count     permit_download = asyncio.Event()     permit_download.set()     downloading_event = asyncio.Event()     no_concurrent = 3     i = 0     while i < 9:         if permit_download.is_set():             count += 1             if count >= no_concurrent:                 permit_download.clear()             loop.create_task(download(i, permit_download, no_concurrent, downloading_event))             await downloading_event.wait()  # To force context to switch to download function             downloading_event.clear()             i += 1         else:             await permit_download.wait()     await asyncio.sleep(9)  if __name__ == '__main__':     loop = asyncio.get_event_loop()     try:         loop.run_until_complete(main(loop))     finally:         loop.close() 

And the output is as expected:

downloading 0 will take 2 second(s) downloading 1 will take 3 second(s) downloading 2 will take 1 second(s) downloaded 2 downloading 3 will take 2 second(s) downloaded 0 downloading 4 will take 3 second(s) downloaded 1 downloaded 3 downloading 5 will take 2 second(s) downloading 6 will take 2 second(s) downloaded 5 downloaded 6 downloaded 4 downloading 7 will take 1 second(s) downloading 8 will take 1 second(s) downloaded 7 downloaded 8 

But here are my questions:

  1. At the moment, I'm simply waiting for 9 seconds to keep the main function running till the downloads are complete. Is there an efficient way of waiting for the last download to complete before exiting the main function? (I know there's asyncio.wait, but I'll need to store all the task references for it to work)

  2. What's a good library that does this kind of task? I know javascript has a lot of async libraries, but what about Python?

Edit: 2. What's a good library that takes care of common async patterns? (Something like async)

like image 494
Shridharshan Avatar asked Jan 28 '18 05:01

Shridharshan


People also ask

Is Python Asyncio concurrent?

asyncio was first introduced in Python 3.4 as an additional way to handle these highly concurrent workloads outside of multithreading and multiprocessing.

Is Python Asyncio multithreaded?

Threading and asyncio both run on a single processor and therefore only run one at a time. They just cleverly find ways to take turns to speed up the overall process. Even though they don't run different trains of thought simultaneously, we still call this concurrency.

Is Asyncio concurrent or parallel?

Asynicio tries the best to be concurrent but it is not parallel. You cannot control the start nor the end of a task. You may control the start if you await the task immediately after it is created as follows, but it becomes synchronous programming then, which makes no sense for asynchronous purpose.

Does Asyncio run on multiple threads?

1.5.Much like we can have multiple threads running at the same time, each with their own concurrent I/O operation, we can have many coroutines running alongside one another. While we are waiting for our I/O bound coroutines to finish, we can still execute other Python code, thus giving us concurrency.


1 Answers

If I'm not mistaken you're searching for asyncio.Semaphore. Example of usage:

import asyncio from random import randint   async def download(code):     wait_time = randint(1, 3)     print('downloading {} will take {} second(s)'.format(code, wait_time))     await asyncio.sleep(wait_time)  # I/O, context will switch to main function     print('downloaded {}'.format(code))   sem = asyncio.Semaphore(3)   async def safe_download(i):     async with sem:  # semaphore limits num of simultaneous downloads         return await download(i)   async def main():     tasks = [         asyncio.ensure_future(safe_download(i))  # creating task starts coroutine         for i         in range(9)     ]     await asyncio.gather(*tasks)  # await moment all downloads done   if __name__ ==  '__main__':     loop = asyncio.get_event_loop()     try:         loop.run_until_complete(main())     finally:         loop.run_until_complete(loop.shutdown_asyncgens())         loop.close() 

Output:

downloading 0 will take 3 second(s) downloading 1 will take 3 second(s) downloading 2 will take 1 second(s) downloaded 2 downloading 3 will take 3 second(s) downloaded 1 downloaded 0 downloading 4 will take 2 second(s) downloading 5 will take 1 second(s) downloaded 5 downloaded 3 downloading 6 will take 3 second(s) downloading 7 will take 1 second(s) downloaded 4 downloading 8 will take 2 second(s) downloaded 7 downloaded 8 downloaded 6 

An example of async downloading with aiohttp can be found here. Note that aiohttp has a Semaphore equivalent built in, which you can see an example of here. It has a default limit of 100 connections.

like image 165
Mikhail Gerasimov Avatar answered Sep 19 '22 06:09

Mikhail Gerasimov