Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way of merging two numpy masked arrays

I have two numpy masked arrays which I want to merge. I'm using the following code:

import numpy as np

a = np.zeros((10000, 10000), dtype=np.int16)
a[:5000, :5000] = 1
am = np.ma.masked_equal(a, 0)

b = np.zeros((10000, 10000), dtype=np.int16)
b[2500:7500, 2500:7500] = 2
bm = np.ma.masked_equal(b, 0)

arr = np.ma.array(np.dstack((am, bm)), mask=np.dstack((am.mask, bm.mask)))
arr = np.prod(arr, axis=2)
plt.imshow(arr)

Plot of the resulting merged array

The problem is that the np.prod() operation is very slow (4 seconds in my computer). Is there an alternative way of getting a merged array in a more efficient way?

like image 954
prl900 Avatar asked Dec 01 '15 05:12

prl900


2 Answers

Instead of your last two lines using dstack() and prod(), try this:

arr = np.ma.array(am.filled(1) * bm.filled(1), mask=(am.mask * bm.mask))

Now you don't need prod() at all, and you avoid allocating the 3D array entirely.

like image 50
John Zwinck Avatar answered Oct 01 '22 21:10

John Zwinck


I took another approach that may not be particularly efficient, but is reasonably easy to extend and implement.

(I know I'm answering a question that is over 3 years old with functionality that has been around in numpy a long time, but bear with me)

The np.where function in numpy has two main purposes (it is a bit weird), the first is to give you indices for a boolean array:

>>> import numpy as np

>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

>>> m = (a % 3 == 0)
>>> m
array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]], dtype=bool)

>>> row_ind, col_ind = np.where(m)
>>> row_ind
array([0, 0, 1, 2])
>>> col_ind
array([0, 3, 2, 1])

The other purpose of the np.where function is to pick from two arrays based on whether the given boolean array is True/False:

>>> np.where(m, a, np.zeros(a.shape))
array([[ 0.,  0.,  0.,  3.],
       [ 0.,  0.,  6.,  0.],
       [ 0.,  9.,  0.,  0.]])

Turns out, there is also a numpy.ma.where which deals with masked arrays...

Given a list of masked arrays of the same shape, my code then looks like:

merged = masked_arrays[0]
for ma in masked_arrays[1:]:
    merged = np.ma.where(ma.mask, merged, ma)

As I say, not particularly efficient, but certainly easy enough to implement.

HTH

like image 30
pelson Avatar answered Oct 01 '22 22:10

pelson