Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharing numpy arrays between multiple processes without inheritance

I would like to share numpy arrays between multiple processes. There are working solutions here. However they all pass the arrays to the child process through inheritance, which does not work for me because I have to start a few worker processes beforehand and I don't know how many arrays I'm going to deal with later on. Is there any way to create such arrays after the process is started and pass these arrays to the processes via queues?

Btw for some reason I'm not able to use multiprocessing.Manager.

like image 471
shaoyl85 Avatar asked Jan 16 '16 06:01

shaoyl85


People also ask

What does .all do in NumPy?

all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.

Can NumPy arrays be multidimensional?

In general numpy arrays can have more than one dimension. One way to create such array is to start with a 1-dimensional array and use the numpy reshape() function that rearranges elements of that array into a new shape.

Does NumPy use multithreading?

The numpy library uses multithreading by default, and so parallelizing a python function that uses numpy may create a huge number of threads. If the number of running threads exceeds the number of cores this could bottleneck important system processes on our compute nodes.

What is __ Array_interface __?

__array_interface__ A dictionary of items (3 required and 5 optional). The optional keys in the dictionary have implied defaults if they are not provided. The keys are: shape (required) Tuple whose elements are the array size in each dimension.


1 Answers

You should use shared memory, which exactly solve your use case. You keep memory read/write speed, and all processes can read and write in the array in shared memory without incurring any serialization or transport cost.

Below is the example from the official python doc:

>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8])  # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:]  # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name  # We did not specify a name so one was chosen for us
'psm_21467_46075'
>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='psm_21467_46075')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([1, 1, 2, 3, 5, 8])
>>> c[-1] = 888
>>> c
array([  1,   1,   2,   3,   5, 888])
>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([  1,   1,   2,   3,   5, 888])
>>> # Clean up from within the second Python shell
>>> del c  # Unnecessary; merely emphasizing the array is no longer used
>>> existing_shm.close()
>>> # Clean up from within the first Python shell
>>> del b  # Unnecessary; merely emphasizing the array is no longer used
>>> shm.close()
>>> shm.unlink()  # Free and release the shared memory block at the very end

For a real use case as yours, you would need to pass the name shm.name using a Pipe or any other multi-processing communication mechanism. Note that only this tiny string will need to be exchanged between processes; the actual data stays in the shared memory space.

like image 123
M1L0U Avatar answered Sep 20 '22 22:09

M1L0U