I've had a hard time trying to understand how and why async functionality works in python and I am still not sure I understand everything correctly (especially the 'why' part). Please correct me if I am wrong.
The purpose of both async methods and threads is to make it possible to process several tasks concurrently.
Threads approach looks simple and intuitive. If python program processes several tasks concurrently we have a thread (may be with sub-threads) for each task, the stack of each thread reflects the current stage of processing of corresponding task. Everything is straightforward, there are easy-to-use mechanisms to start a new thread and wait for results from it.
As I understand the only problem with this approach is that threads are expensive.
Another approach is using async
coroutines. I can see several inconveniences with this approach. I'll name couple of them only. We have now two types of methods: usual methods and async
methods. 90% of the time the only difference is that you need to remember that this method is async
and do not forget to use await
keyword when calling this method. And yes, you can't call async
method from normal ones. And all this async
- await
syntactic garbage all around the program is only to indicate that this method is able to yield control to message loop.
Threads approach is free of all these inconveniences. But async
- await
approach allows to process much more concurrent tasks than threads approach does. How is that possible?
For each concurrent task we still have a call stack, only now it is a coroutine call stack. I am not quite sure, but looks like this is the key difference: usual stacks are operating-system stacks, they are expensive, coroutine stacks are just a python structures, they are much cheaper. Is this my understanding correct?
If this is correct, wouldn't it be better to decouple python threads/call stacks from OS threads/call stacks to make python threads cheaper?
Sorry if this question is stupid. I am sure there are some reasons why async
-await
approach was selected. Just want to understand these reasons.
Update:
For those who do not think this question is not good and too broad.
Here is an article Unyielding - which starts with explanations why threads are bad and advertises async
approach. Main thesis: threads are evil, it's too difficult to reason about a routine that may be executed from arbitrary number of threads concurrently.
Thanks to Nathaniel J. Smith (author of python Trio library) who suggested this link.
By the way, arguments in the article are not convincing for me, but still may be useful.
One of the cool advantages of asyncio is that it scales far better than threading . Each task takes far fewer resources and less time to create than a thread, so creating and running more of them works well. This example just creates a separate task for each site to download, which works out quite well.
Async methods don't require multithreading because an async method doesn't run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active. You can use Task.
Tasks + async / await are faster in this case than a pure multi threaded code. It's the simplicity which makes async / await so appealing.
Asynchronous programming is a programming paradigm that enables better concurrency, that is, multiple threads running concurrently. In Python, asyncio module provides this capability. Multiple tasks can run concurrently on a single thread, which is scheduled on a single CPU core.
This article answers your questions.
TL;DR?
Threading in Python is inefficient because of the GIL (Global Interpreter Lock) which means that multiple threads cannot be run in parallel as you would expect on a multi-processor system. Plus you have to rely on the interpreter to switch between threads, this adds to the inefficiency.
asyc/asyncio allows concurrency within a single thread. This gives you, as the developer, much more fine grained control of the task switching and can give much better performance for concurrent I/O bound tasks than Python threading.
The 3rd approach that you don't mention is multiprocessing. This approach uses processes for concurrency and allows programs to make full use of hardware with multiple cores.
Asyncio is a wholly different world, and AFAIK it's the answer of python to node.js which does this things since the start. E.g. this official python doc about asyncio states:
Asynchronous programming is different than classical “sequential” programming
So you'd need to decide if you want to jump into that rabbit hole and learn this terminology. It probably only makes sense if you're faced with either network or disk related heavy tasks. If you are then e.g. this article claims that python 3's asyncio might be faster than node.js and close to the performance of Go.
That said: I've not used asyncio yet, so I cannot really commment on this, but I can comment on a few sentences from your question:
And all this async - await syntactic garbage all around the program is only to indicate that this method is able to yield control to message loop
As far as I can see you have an initial setup of asyncio, but then all the calls have less syntax around it than doing the same things with threads which you need to start()
and join()
and probably also to check with is_alive()
, and to fetch the return value you need to set up a shared object first. So: no, asyncio just looks different but in the end the program will most probably look cleaner than with threads.
As I understand the only problem with this approach is that threads are expensive
Not really. Starting a new thread is very inexpensive and has AFAIK the same cost as starting a "native thread" in C or Java
looks like this is the key difference: usual stacks are operating-system stacks, they are expensive, coroutine stacks are just a python structures, they are much cheaper. Is this my understanding correct?
Not really. Nothing beats creating OS level threads, they are cheap. What asyncio is better at is that you need less thread switches. So if you have many concurrent threads waiting for network or disk then asyncio would probably speed up things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With