I want to share a numpy array across multiple processes. The processes only read the data, so I want to avoid making copies. I know how to do it if I can start with a multiprocessing.sharedctypes.RawArray
and then create a numpy array using numpy.frombuffer
. But what if I am initially given a numpy array? Is there a way to initialize a RawArray with the numpy array's data without copying the data? Or is there another way to share the data across the processes without copying it?
To my knowledge it is not possible to declare memory as shared after it was assigned to a specific process. Similar discussions can be found here and here (more suitable).
Let me quickly sketch the workaround you mentioned (starting with a RawArray
and get a numpy.ndarray
refference to it).
import numpy as np
from multiprocessing.sharedctypes import RawArray
# option 1
raw_arr = RawArray(ctypes.c_int, 12)
# option 2 (set is up, similar to some existing np.ndarray np_arr2)
raw_arr = RawArray(
np.ctypeslib.as_ctypes_type(np_arr2.dtype), len(np_arr2)
)
np_arr = np.frombuffer(raw_arr, dtype=np.dtype(raw_arr))
# np_arr: numpy array with shared memory, can be processed by multiprocessing
If you have to start with a numpy.ndarray
, you have no other choice as to copy the data
import numpy as np
from multiprocessing.sharedctypes import RawArray
np_arr = np.zeros(shape=(3, 4), dtype=np.ubyte)
# option 1
tmp = np.ctypeslib.as_ctypes(np_arr)
raw_arr = RawArray(tmp._type_, tmp)
# option 2
raw_arr = RawArray(np.ctypeslib.as_ctypes_type(np_arr.dtype), np_arr.flatten())
print(raw_arr[:])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With