I want to apply boolean masking both to rows and columns.
With
X = np.array([[1,2,3],[4,5,6]])
mask1 = np.array([True, True])
mask2 = np.array([True, True, False])
X[mask1, mask2]
I expect the output to be
array([[1,2],[4,5]])
instead of
array([1,5])
It's known that
X[:, mask2]
can be used here but that's not a solution for the general case.
I would like to know how it works under the hood and why in this case the result is array([1,5])
.
all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.
A masked array is the combination of a standard numpy. ndarray and a mask. A mask is either nomask , indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.
X[mask1, mask2]
is described in Boolean Array Indexing Doc as the equivalent of
In [249]: X[mask1.nonzero()[0], mask2.nonzero()[0]]
Out[249]: array([1, 5])
In [250]: X[[0,1], [0,1]]
Out[250]: array([1, 5])
In effect it is giving you X[0,0]
and X[1,1]
(pairing the 0s and 1s).
What you want instead is:
In [251]: X[[[0],[1]], [0,1]]
Out[251]:
array([[1, 2],
[4, 5]])
np.ix_
is a handy tool for creating the right mix of dimensions
In [258]: np.ix_([0,1],[0,1])
Out[258]:
(array([[0],
[1]]), array([[0, 1]]))
In [259]: X[np.ix_([0,1],[0,1])]
Out[259]:
array([[1, 2],
[4, 5]])
That's effectively a column vector for the 1st axis and row vector for the second, together defining the desired rectangle of values.
But trying to broadcast boolean arrays like this does not work: X[mask1[:,None], mask2]
But that reference section says:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
In [260]: X[np.ix_(mask1, mask2)]
Out[260]:
array([[1, 2],
[4, 5]])
In [261]: np.ix_(mask1, mask2)
Out[261]:
(array([[0],
[1]], dtype=int32), array([[0, 1]], dtype=int32))
The boolean section of ix_
:
if issubdtype(new.dtype, _nx.bool_):
new, = new.nonzero()
So it works with a mix like X[np.ix_(mask1, [0,2])]
One solution would be to use sequential integer indexing and getting the integers for example from np.where
:
>>> X[:, np.where(mask1)[0]][np.where(mask2)[0]]
array([[1, 2],
[4, 5]])
or as @user2357112 pointed out in the comments np.ix_
could be used as well. For example:
>>> X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]
array([[1, 2],
[4, 5]])
Another idea would be to broadcast your masks and then do it in one step would require a reshape afterwards:
>>> X[np.where(mask1[:, None] * mask2)]
array([1, 2, 4, 5])
>>> X[np.where(mask1[:, None] * mask2)].reshape(2, 2)
array([[1, 2],
[4, 5]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With