Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boolean masking on multiple axes with numpy

Tags:

python

numpy

I want to apply boolean masking both to rows and columns.

With

X = np.array([[1,2,3],[4,5,6]])
mask1 = np.array([True, True])
mask2 = np.array([True, True, False])
X[mask1, mask2]

I expect the output to be

array([[1,2],[4,5]])

instead of

array([1,5])

It's known that

X[:, mask2]

can be used here but that's not a solution for the general case.

I would like to know how it works under the hood and why in this case the result is array([1,5]).

like image 321
tarashypka Avatar asked Feb 18 '17 00:02

tarashypka


People also ask

What does .all do in numpy?

all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.

What is masked array numpy?

A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.


2 Answers

X[mask1, mask2] is described in Boolean Array Indexing Doc as the equivalent of

In [249]: X[mask1.nonzero()[0], mask2.nonzero()[0]]
Out[249]: array([1, 5])
In [250]: X[[0,1], [0,1]]
Out[250]: array([1, 5])

In effect it is giving you X[0,0] and X[1,1] (pairing the 0s and 1s).

What you want instead is:

In [251]: X[[[0],[1]], [0,1]]
Out[251]: 
array([[1, 2],
       [4, 5]])

np.ix_ is a handy tool for creating the right mix of dimensions

In [258]: np.ix_([0,1],[0,1])
Out[258]: 
(array([[0],
        [1]]), array([[0, 1]]))
In [259]: X[np.ix_([0,1],[0,1])]
Out[259]: 
array([[1, 2],
       [4, 5]])

That's effectively a column vector for the 1st axis and row vector for the second, together defining the desired rectangle of values.

But trying to broadcast boolean arrays like this does not work: X[mask1[:,None], mask2]

But that reference section says:

Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

In [260]: X[np.ix_(mask1, mask2)]
Out[260]: 
array([[1, 2],
       [4, 5]])
In [261]: np.ix_(mask1, mask2)
Out[261]: 
(array([[0],
        [1]], dtype=int32), array([[0, 1]], dtype=int32))

The boolean section of ix_:

    if issubdtype(new.dtype, _nx.bool_):
        new, = new.nonzero()

So it works with a mix like X[np.ix_(mask1, [0,2])]

like image 174
hpaulj Avatar answered Nov 02 '22 10:11

hpaulj


One solution would be to use sequential integer indexing and getting the integers for example from np.where:

>>> X[:, np.where(mask1)[0]][np.where(mask2)[0]]
array([[1, 2],
       [4, 5]])

or as @user2357112 pointed out in the comments np.ix_ could be used as well. For example:

>>> X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]
array([[1, 2],
       [4, 5]])

Another idea would be to broadcast your masks and then do it in one step would require a reshape afterwards:

>>> X[np.where(mask1[:, None] * mask2)]
array([1, 2, 4, 5])

>>> X[np.where(mask1[:, None] * mask2)].reshape(2, 2)
array([[1, 2],
       [4, 5]])
like image 41
MSeifert Avatar answered Nov 02 '22 10:11

MSeifert