Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning large objects from child processes in python multiprocessing

I'm working with Python multiprocessing to spawn some workers. Each of them should return an array that's a few MB in size.

  1. Is it correct that since my return array is created in the child process, it needs to be copied back to the parent's memory when the process ends? (this seems to take a while, but it might be a pypy issue)
  2. Is there a mechanism to allow the parent and child to access the same in-memory object? (synchronization is not an issue since only one child would access each object)

I'm afraid I have a few gaps in how python implements multi-processing, and trying to persuade pypy to play nice is not making things any easier. Thanks!

like image 917
Miquel Avatar asked Feb 12 '23 18:02

Miquel


1 Answers

Yes, if the return array is created in the child process, it must be sent to the parent by pickling it, sending the pickled bytes back to the parent via a Pipe, and then unpickling the object in the parent. For a large object, this is pretty slow in CPython, so it's not just a PyPy issue. It is possible that performance is worse in PyPy, though; I haven't tried comparing the two, but this PyPy bug seems to suggest that multiprocessing in PyPy is slower than in CPython.

In CPython, there is a way to allocate ctypes objects in shared memory, via multiprocessing.sharedctypes. PyPy seems to support this API, too. The limitation (obviously) is that you're restricted to ctypes objects.

There is also multiprocessing.Manager, which would allow you to create a shared array/list object in a Manager process, and then both the parent and child could access the shared list via a Proxy object. The downside there is that read/write performance to the object is much slower than it would be as a local object, or even if it was a roughly equivalent object created using multiprocessing.sharedctypes.

like image 63
dano Avatar answered Feb 15 '23 09:02

dano