Basically, I'm looking for something that offers a parallel map using python3 coroutines as the backend instead of threads or processes. I believe there should be less overhead when performing highly parallel IO work.
Surely something similar already exists, be it in the standard library or some widely used package?
Concurrency Handling Functions are first-class objects in Python, which means they can be passed as arguments to other functions. AsyncIO ships with the awaitable asyncio. gather() function. It is used to run concurrent functions in a given sequence, as shown in the below code snippet.
Python's asyncio package (introduced in Python 3.4) and its two keywords, async and await , serve different purposes but come together to help you declare, build, execute, and manage asynchronous code.
Due to global interpreter lock (GIL), python threads do not provide efficient multi-core execution, unlike other languages such as golang. Asynchronous programming in python is focusing on single core execution.
Python provides mechanisms for both concurrency and parallelism, each with its own syntax and use cases. Python has two different mechanisms for implementing concurrency, although they share many common components. These are threading and coroutines, or async.
DISCLAIMER PEP 0492 defines only syntax and usage for coroutines. They require an event loop to run, which is most likely asyncio
's event loop.
I don't know any implementation of map
based on coroutines. However it's trivial to implement basic map
functionality using asyncio.gather()
:
def async_map(coroutine_func, iterable): loop = asyncio.get_event_loop() future = asyncio.gather(*(coroutine_func(param) for param in iterable)) return loop.run_until_complete(future)
This implementation is really simple. It creates a coroutine for each item in the iterable
, joins them into single coroutine and executes joined coroutine on event loop.
Provided implementation covers part of the cases. However it has a problem. With long iterable you would probably want to limit amount of coroutines running in parallel. I can't come up with simple implementation, which is efficient and preserves order at the same time, so I will leave it as an exercise for a reader.
You claimed:
I believe there should be less overhead when performing highly parallel IO work.
It requires proof, so here is a comparison of multiprocessing
implementation, gevent
implementation by a p and my implementation based on coroutines. All tests were performed on Python 3.5.
Implementation using multiprocessing
:
from multiprocessing import Pool import time def async_map(f, iterable): with Pool(len(iterable)) as p: # run one process per item to measure overhead only return p.map(f, iterable) def func(val): time.sleep(1) return val * val
Implementation using gevent
:
import gevent from gevent.pool import Group def async_map(f, iterable): group = Group() return group.map(f, iterable) def func(val): gevent.sleep(1) return val * val
Implementation using asyncio
:
import asyncio def async_map(f, iterable): loop = asyncio.get_event_loop() future = asyncio.gather(*(f(param) for param in iterable)) return loop.run_until_complete(future) async def func(val): await asyncio.sleep(1) return val * val
Testing program is usual timeit
:
$ python3 -m timeit -s 'from perf.map_mp import async_map, func' -n 1 'async_map(func, list(range(10)))'
Results:
Iterable of 10
items:
multiprocessing
- 1.05 secgevent
- 1 secasyncio
- 1 secIterable of 100
items:
multiprocessing
- 1.16 secgevent
- 1.01 secasyncio
- 1.01 secIterable of 500
items:
multiprocessing
- 2.31 secgevent
- 1.02 secasyncio
- 1.03 secIterable of 5000
items:
multiprocessing
- failed (spawning 5k processes is not so good idea!)gevent
- 1.12 secasyncio
- 1.22 secIterable of 50000
items:
gevent
- 2.2 secasyncio
- 3.25 secConcurrency based on event loop works faster, when program do mostly I/O, not computations. Keep in mind, that difference will be smaller, when there are less I/O and more computations are involved.
Overhead introduced by spawning processes is significantly bigger, than overhead introduced by event loop based concurrency. It means that your assumption is correct.
Comparing asyncio
and gevent
we can say, that asyncio
has 33-45% bigger overhead. It means that creation of greenlets is cheaper, than creation of coroutines.
As a final conclusion: gevent
has better performance, but asyncio
is part of the standard library. Difference in performance (absolute numbers) isn't very significant. gevent
is quite mature library, while asyncio
is relatively new, but it advances quickly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With