np.where and masked array

Question

I'm working with masked arrays thanks to some of the help I've gotten on stackoverflow, but I'm running into a problem with the np.where evaluation of a masked array.

My masked array is:

m_pt0 = np.ma.masked_array([1, 2, 3, 0, 4, 7, 6, 5],
                           mask=[False, True, False, False,
                                 False, False, False, False])

And prints like this:

In [24]: print(m_pt0)
[1 -- 3 0 4 7 6 5]

And I'm looking for the index in m_pt0 where m_pt0 = 0, I would expect that

np.where(0 == m_pt0)

would return:

(array([3]))

However, despite the mask (or because of?), I instead get

(array([1, 3]),)

The entire point of using the mask is to avoid accessing indices that are "blank", so how can I use where (or another function) to only retrieve the indices that are unmasked and match my boolean criteria.

blubberdiblub · Accepted Answer

You need to use the masked variant of the where() function, otherwise it will return wrong or unwanted results for masked arrays. The same goes for other functions, like polyfit().

I. e.:

In [2]: np.ma.where(0 == m_pt0)
Out[2]: (array([3]),)

hpaulj · Answer

The equality test may create confusion. The result is another masked array:

In [19]: 0 == m_pt0
Out[19]: 
masked_array(data = [False -- False True False False False False],
             mask = [False  True False False False False False False],
       fill_value = True)

A masked array has .data and .mask attributes. numpy functions that aren't MA aware just see the .data:

In [20]: _.data
Out[20]: array([False,  True, False,  True, False, False, False, False], dtype=bool)

np.where sees the 2 True, and returns

In [23]: np.where(0 == m_pt0)
Out[23]: (array([1, 3], dtype=int32),)
In [24]: np.where((0 == m_pt0).data)
Out[24]: (array([1, 3], dtype=int32),)

Where possible it is better to use the np.ma version of a function:

In [25]: np.ma.where(0 == m_pt0)
Out[25]: (array([3], dtype=int32),)

Looking at the code for np.source(np.ma.where) I see it does

if missing == 2:
    return filled(condition, 0).nonzero()
(plus lots of code for the 3 argument use)

That filled does:

In [27]: np.ma.filled((0 == m_pt0),0)
Out[27]: array([False, False, False,  True, False, False, False, False], dtype=bool)

MA functions often replace the masked values with something innocuous (0 in this case), or use compressed to remove them from consideration.

In [36]: m_pt0.compressed()
Out[36]: array([1, 3, 0, 4, 7, 6, 5])
In [37]: m_pt0.filled(100)
Out[37]: array([  1, 100,   3,   0,   4,   7,   6,   5])

A numpy function will work correctly on a MA if it delegates the work to the array's own method.

In [41]: np.nonzero(m_pt0)
Out[41]: (array([0, 2, 4, 5, 6, 7], dtype=int32),)
In [42]: m_pt0.nonzero()
Out[42]: (array([0, 2, 4, 5, 6, 7], dtype=int32),)
In [43]: np.where(m_pt0)
Out[43]: (array([0, 1, 2, 4, 5, 6, 7], dtype=int32),)

np.nonzero delegates. np.where does not.

The repr of a masked array shows the mask. Its str just shows the masked data:

In [31]: m_pt0
Out[31]: 
masked_array(data = [1 -- 3 0 4 7 6 5],
             mask = [False  True False False False False False False],
       fill_value = 999999)
In [32]: str(m_pt0)
Out[32]: '[1 -- 3 0 4 7 6 5]'

np.where and masked array

Tags:

python

arrays

boolean-operations

numpy

masked-array

stagermane

2 Answers

blubberdiblub

hpaulj

Recent Activity

Donate For Us

np.where and masked array

Tags:

python

arrays

boolean-operations

numpy

masked-array

stagermane

2 Answers

blubberdiblub

hpaulj

Related questions

Recent Activity

Donate For Us