Let's assume we have a bunch of links to download and each of the link may take a different amount of time to download. And I'm allowed to download using utmost 3 connections only. Now, I want to ensure that I do this efficiently using asyncio.
Here's what I'm trying to achieve: At any point in time, try to ensure that I have atleast 3 downloads running.
Connection 1: 1---------7---9--- Connection 2: 2---4----6----- Connection 3: 3-----5---8-----
The numbers represent the download links, while hyphens represent Waiting for download.
Here is the code that I'm using right now
from random import randint import asyncio count = 0 async def download(code, permit_download, no_concurrent, downloading_event): global count downloading_event.set() wait_time = randint(1, 3) print('downloading {} will take {} second(s)'.format(code, wait_time)) await asyncio.sleep(wait_time) # I/O, context will switch to main function print('downloaded {}'.format(code)) count -= 1 if count < no_concurrent and not permit_download.is_set(): permit_download.set() async def main(loop): global count permit_download = asyncio.Event() permit_download.set() downloading_event = asyncio.Event() no_concurrent = 3 i = 0 while i < 9: if permit_download.is_set(): count += 1 if count >= no_concurrent: permit_download.clear() loop.create_task(download(i, permit_download, no_concurrent, downloading_event)) await downloading_event.wait() # To force context to switch to download function downloading_event.clear() i += 1 else: await permit_download.wait() await asyncio.sleep(9) if __name__ == '__main__': loop = asyncio.get_event_loop() try: loop.run_until_complete(main(loop)) finally: loop.close()
And the output is as expected:
downloading 0 will take 2 second(s) downloading 1 will take 3 second(s) downloading 2 will take 1 second(s) downloaded 2 downloading 3 will take 2 second(s) downloaded 0 downloading 4 will take 3 second(s) downloaded 1 downloaded 3 downloading 5 will take 2 second(s) downloading 6 will take 2 second(s) downloaded 5 downloaded 6 downloaded 4 downloading 7 will take 1 second(s) downloading 8 will take 1 second(s) downloaded 7 downloaded 8
But here are my questions:
At the moment, I'm simply waiting for 9 seconds to keep the main function running till the downloads are complete. Is there an efficient way of waiting for the last download to complete before exiting the main
function? (I know there's asyncio.wait
, but I'll need to store all the task references for it to work)
What's a good library that does this kind of task? I know javascript has a lot of async libraries, but what about Python?
Edit: 2. What's a good library that takes care of common async patterns? (Something like async)
asyncio was first introduced in Python 3.4 as an additional way to handle these highly concurrent workloads outside of multithreading and multiprocessing.
Threading and asyncio both run on a single processor and therefore only run one at a time. They just cleverly find ways to take turns to speed up the overall process. Even though they don't run different trains of thought simultaneously, we still call this concurrency.
Asynicio tries the best to be concurrent but it is not parallel. You cannot control the start nor the end of a task. You may control the start if you await the task immediately after it is created as follows, but it becomes synchronous programming then, which makes no sense for asynchronous purpose.
1.5.Much like we can have multiple threads running at the same time, each with their own concurrent I/O operation, we can have many coroutines running alongside one another. While we are waiting for our I/O bound coroutines to finish, we can still execute other Python code, thus giving us concurrency.
If I'm not mistaken you're searching for asyncio.Semaphore. Example of usage:
import asyncio from random import randint async def download(code): wait_time = randint(1, 3) print('downloading {} will take {} second(s)'.format(code, wait_time)) await asyncio.sleep(wait_time) # I/O, context will switch to main function print('downloaded {}'.format(code)) sem = asyncio.Semaphore(3) async def safe_download(i): async with sem: # semaphore limits num of simultaneous downloads return await download(i) async def main(): tasks = [ asyncio.ensure_future(safe_download(i)) # creating task starts coroutine for i in range(9) ] await asyncio.gather(*tasks) # await moment all downloads done if __name__ == '__main__': loop = asyncio.get_event_loop() try: loop.run_until_complete(main()) finally: loop.run_until_complete(loop.shutdown_asyncgens()) loop.close()
Output:
downloading 0 will take 3 second(s) downloading 1 will take 3 second(s) downloading 2 will take 1 second(s) downloaded 2 downloading 3 will take 3 second(s) downloaded 1 downloaded 0 downloading 4 will take 2 second(s) downloading 5 will take 1 second(s) downloaded 5 downloaded 3 downloading 6 will take 3 second(s) downloading 7 will take 1 second(s) downloaded 4 downloading 8 will take 2 second(s) downloaded 7 downloaded 8 downloaded 6
An example of async downloading with aiohttp
can be found here. Note that aiohttp
has a Semaphore equivalent built in, which you can see an example of here. It has a default limit of 100 connections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With