Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

np.where and masked array

I'm working with masked arrays thanks to some of the help I've gotten on stackoverflow, but I'm running into a problem with the np.where evaluation of a masked array.

My masked array is:

m_pt0 = np.ma.masked_array([1, 2, 3, 0, 4, 7, 6, 5],
                           mask=[False, True, False, False,
                                 False, False, False, False])

And prints like this:

In [24]: print(m_pt0)
[1 -- 3 0 4 7 6 5]

And I'm looking for the index in m_pt0 where m_pt0 = 0, I would expect that

np.where(0 == m_pt0)

would return:

(array([3]))

However, despite the mask (or because of?), I instead get

(array([1, 3]),)

The entire point of using the mask is to avoid accessing indices that are "blank", so how can I use where (or another function) to only retrieve the indices that are unmasked and match my boolean criteria.

like image 432
stagermane Avatar asked Jan 05 '23 02:01

stagermane


2 Answers

You need to use the masked variant of the where() function, otherwise it will return wrong or unwanted results for masked arrays. The same goes for other functions, like polyfit().

I. e.:

In [2]: np.ma.where(0 == m_pt0)
Out[2]: (array([3]),)
like image 77
blubberdiblub Avatar answered Jan 06 '23 17:01

blubberdiblub


The equality test may create confusion. The result is another masked array:

In [19]: 0 == m_pt0
Out[19]: 
masked_array(data = [False -- False True False False False False],
             mask = [False  True False False False False False False],
       fill_value = True)

A masked array has .data and .mask attributes. numpy functions that aren't MA aware just see the .data:

In [20]: _.data
Out[20]: array([False,  True, False,  True, False, False, False, False], dtype=bool)

np.where sees the 2 True, and returns

In [23]: np.where(0 == m_pt0)
Out[23]: (array([1, 3], dtype=int32),)
In [24]: np.where((0 == m_pt0).data)
Out[24]: (array([1, 3], dtype=int32),)

Where possible it is better to use the np.ma version of a function:

In [25]: np.ma.where(0 == m_pt0)
Out[25]: (array([3], dtype=int32),)

Looking at the code for np.source(np.ma.where) I see it does

if missing == 2:
    return filled(condition, 0).nonzero()
(plus lots of code for the 3 argument use)

That filled does:

In [27]: np.ma.filled((0 == m_pt0),0)
Out[27]: array([False, False, False,  True, False, False, False, False], dtype=bool)

MA functions often replace the masked values with something innocuous (0 in this case), or use compressed to remove them from consideration.

In [36]: m_pt0.compressed()
Out[36]: array([1, 3, 0, 4, 7, 6, 5])
In [37]: m_pt0.filled(100)
Out[37]: array([  1, 100,   3,   0,   4,   7,   6,   5])

A numpy function will work correctly on a MA if it delegates the work to the array's own method.

In [41]: np.nonzero(m_pt0)
Out[41]: (array([0, 2, 4, 5, 6, 7], dtype=int32),)
In [42]: m_pt0.nonzero()
Out[42]: (array([0, 2, 4, 5, 6, 7], dtype=int32),)
In [43]: np.where(m_pt0)
Out[43]: (array([0, 1, 2, 4, 5, 6, 7], dtype=int32),)

np.nonzero delegates. np.where does not.


The repr of a masked array shows the mask. Its str just shows the masked data:

In [31]: m_pt0
Out[31]: 
masked_array(data = [1 -- 3 0 4 7 6 5],
             mask = [False  True False False False False False False],
       fill_value = 999999)
In [32]: str(m_pt0)
Out[32]: '[1 -- 3 0 4 7 6 5]'
like image 40
hpaulj Avatar answered Jan 06 '23 15:01

hpaulj