How do I give child processes access to data in shared memory if the data is only available after the child processes have been spawned (using multiprocessing.Process)?
I am aware of multiprocessing.sharedctypes.RawArray, but I can't figure out how to give my child processes access to a RawArray
that is created after the processes have already started.
The data is generated by the parent process, and the amount of data is not known in advance.
If not for the GIL I'd be using threading instead which will make this task a little simpler. Using a non-CPython implementation is not an option.
Looking under the hood of muliprocessing.sharedctypes, it looks like shared ctype objects are allocated using mmap
ed memory.
So this question really boils down to: Can a child process access an anonymously mapped memory if mmap()
was called by the parent after the child process was spawned?
That's somewhat in the vein of what's being asked in this question, except that in my case the caller of mmap()
is the parent process and not the child process.
I created my own version of RawArray
that uses shm_open()
under the hood. The resulting shared ctypes array can be shared with any process as long as the identifier (tag
) matches.
See this answer for details and an example.
Disclaimer: I am the author of the question.
I eventually used the posix_ipc module to create my own version of RawArray. I used mainly posix_ipc.SharedMemory
which calls shm_open()
under the hood.
My implementation (ShmemRawArray
) exposes the same functionality as RawArray
but required two additional parameters - a tag
to uniquely identify the shared memory region, and a create
flag to determine if we should be created a new shared memory segment or attach to an existing one.
Here's a copy if anyone's interested: https://gist.github.com/1222327
ShmemRawArray(typecode_or_type, size_or_initializer, tag, create=True)
Usage notes:
typecode_or_type
and size_or_initializer
) should work the same as with RawArray
.tag
matches.ShmemRawArray(..., create=True)
) is deletedtag
that currently exists will raise an ExistentialError
tag
that doesn't exist (or one that has been unlinked) will also raise an ExistentialError
A SSCCE (Short, Self Contained, Compilable Example) showing it in action.
#!/usr/bin/env python2.7
import ctypes
import multiprocessing
from random import random, randint
from shmemctypes import ShmemRawArray
class Point(ctypes.Structure):
_fields_ = [ ("x", ctypes.c_double), ("y", ctypes.c_double) ]
def worker(q):
# get access to ctypes array shared by parent
count, tag = q.get()
shared_data = ShmemRawArray(Point, count, tag, False)
proc_name = multiprocessing.current_process().name
print proc_name, ["%.3f %.3f" % (d.x, d.y) for d in shared_data]
if __name__ == '__main__':
procs = []
np = multiprocessing.cpu_count()
queue = multiprocessing.Queue()
# spawn child processes
for i in xrange(np):
p = multiprocessing.Process(target=worker, args=(queue,))
procs.append(p)
p.start()
# create a unique tag for shmem segment
tag = "stack-overflow-%d" % multiprocessing.current_process().pid
# random number of points with random data
count = randint(3,10)
combined_data = [Point(x=random(), y=random()) for i in xrange(count)]
# create ctypes array in shared memory using ShmemRawArray
# - we won't be able to use multiprocssing.sharectypes.RawArray here
# because children already spawned
shared_data = ShmemRawArray(Point, combined_data, tag)
# give children info needed to access ctypes array
for p in procs:
queue.put((count, tag))
print "Parent", ["%.3f %.3f" % (d.x, d.y) for d in shared_data]
for p in procs:
p.join()
Running this results in the following output:
[me@home]$ ./shmem_test.py
Parent ['0.633 0.296', '0.559 0.008', '0.814 0.752', '0.842 0.110']
Process-1 ['0.633 0.296', '0.559 0.008', '0.814 0.752', '0.842 0.110']
Process-2 ['0.633 0.296', '0.559 0.008', '0.814 0.752', '0.842 0.110']
Process-3 ['0.633 0.296', '0.559 0.008', '0.814 0.752', '0.842 0.110']
Process-4 ['0.633 0.296', '0.559 0.008', '0.814 0.752', '0.842 0.110']
Your problem sounds like a perfect fit for the posix_ipc
or sysv_ipc
modules, which expose either the POSIX or SysV APIs for shared memory, semaphores, and message queues. The feature matrix there includes excellent advice for picking amongst the modules he provides.
The problem with anonymous mmap(2)
areas is that you cannot easily share them with other processes -- if they were file-backed, it'd be easy, but if you don't actually need the file for anything else, it feels silly. You could use the CLONE_VM
flag to the clone(2)
system call if this were in C, but I wouldn't want to try using it with a language interpreter that probably makes assumptions about memory safety. (It'd be a little dangerous even in C, as maintenance programmers five years from now might also be shocked by the CLONE_VM
behavior.)
But the SysV and newer POSIX shared memory mappings allow even unrelated processes to attach and detach from shared memory by identifier, so all you need to do is share the identifier from the processes that create the mappings with the processes that consume the mappings, and then when you manipulate data within the mappings, they are available to all processes simultaneously without any additional parsing overhead. The shm_open(3)
function returns an int
that is used as a file descriptor in later calls to ftruncate(2)
and then mmap(2)
, so other processes can use the shared memory segment without a file being created in the filesystem -- and this memory will persist even if all processes using it have exited. (A little strange for Unix, perhaps, but it is flexible.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With