I'm working with Python multiprocessing to spawn some workers. Each of them should return an array that's a few MB in size.
I'm afraid I have a few gaps in how python implements multi-processing, and trying to persuade pypy to play nice is not making things any easier. Thanks!
Yes, if the return array is created in the child process, it must be sent to the parent by pickling it, sending the pickled bytes back to the parent via a Pipe
, and then unpickling the object in the parent. For a large object, this is pretty slow in CPython, so it's not just a PyPy issue. It is possible that performance is worse in PyPy, though; I haven't tried comparing the two, but this PyPy bug seems to suggest that multiprocessing
in PyPy is slower than in CPython.
In CPython, there is a way to allocate ctypes
objects in shared memory, via multiprocessing.sharedctypes
. PyPy seems to support this API, too. The limitation (obviously) is that you're restricted to ctypes
objects.
There is also multiprocessing.Manager
, which would allow you to create a shared array/list object in a Manager
process, and then both the parent and child could access the shared list via a Proxy
object. The downside there is that read/write performance to the object is much slower than it would be as a local object, or even if it was a roughly equivalent object created using multiprocessing.sharedctypes
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With