I'm in a situation where I need to parallel process a very big numpy array (55x117x256x256). Trying to pass it around with the usual multiprocessing approach gives an AssertionError, which I understand to be because the array is too big to copy into each process. Because of this, I would like to try using shared memory with multiprocessing. (I'm open to other approaches, provided they aren't too complicated).
I've seen a few questions asking about the use of python multiprocessing's shared memory approach, e.g.
import numpy as np
import multiprocessing as mp
unsharedData = np.zeros((10,))
sharedData = mp.Array('d', unsharedData)
which seem to work fine. However, I haven't yet seen an example where this is done with a multidimensional array.
I've tried just sticking the multidimensional array into mp.Array
which gives me TypeError: only size-1 arrays can be converted to Python scalars
.
unsharedData2 = np.zeros((10,10))
sharedData2 = mp.Array('d', unsharedData2)## Gives TypeError
I can flatten the array, but I'd rather not if it can be avoided.
Is there some trick to get multiprocessing Array to handle multidimensional data?
You can use np.reshape((-1,))
or np.ravel
instead of np.flatten
to make a 1-dimensional view
of your array, without unnecessary copying that flatten does:
import numpy as np
import multiprocessing as mp
unsharedData2 = np.zeros((10, 10))
ravel_copy = np.ravel(unsharedData2)
reshape_copy2 = unsharedData2.reshape((-1,))
ravel_copy[11] = 1.0 # -> saves 1.0 in unsharedData2 at [1, 1]
reshape_copy2[22] = 2.0 # -> saves 2.0 in unsharedData2 at [2, 2]
sharedData2 = mp.Array('d', ravel_copy)
sharedData2 = mp.Array('d', reshape_copy2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With