Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing - shared memory with multidimensional numpy array

I'm in a situation where I need to parallel process a very big numpy array (55x117x256x256). Trying to pass it around with the usual multiprocessing approach gives an AssertionError, which I understand to be because the array is too big to copy into each process. Because of this, I would like to try using shared memory with multiprocessing. (I'm open to other approaches, provided they aren't too complicated).

I've seen a few questions asking about the use of python multiprocessing's shared memory approach, e.g.

import numpy as np
import multiprocessing as mp

unsharedData = np.zeros((10,))
sharedData = mp.Array('d', unsharedData)

which seem to work fine. However, I haven't yet seen an example where this is done with a multidimensional array.

I've tried just sticking the multidimensional array into mp.Array which gives me TypeError: only size-1 arrays can be converted to Python scalars.

unsharedData2 = np.zeros((10,10))
sharedData2 = mp.Array('d', unsharedData2)## Gives TypeError

I can flatten the array, but I'd rather not if it can be avoided.

Is there some trick to get multiprocessing Array to handle multidimensional data?

like image 774
Theolodus Avatar asked May 10 '18 10:05

Theolodus


1 Answers

You can use np.reshape((-1,)) or np.ravel instead of np.flatten to make a 1-dimensional view of your array, without unnecessary copying that flatten does:

import numpy as np
import multiprocessing as mp

unsharedData2 = np.zeros((10, 10))
ravel_copy = np.ravel(unsharedData2)
reshape_copy2 = unsharedData2.reshape((-1,))
ravel_copy[11] = 1.0       # -> saves 1.0 in unsharedData2 at [1, 1]
reshape_copy2[22] = 2.0    # -> saves 2.0 in unsharedData2 at [2, 2]
sharedData2 = mp.Array('d', ravel_copy)
sharedData2 = mp.Array('d', reshape_copy2)
like image 155
dankal444 Avatar answered Oct 15 '22 13:10

dankal444