For the 2D array y: <pre class="prettyprint"><code>y = np.arange(20).reshape(5,4) --- [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19]] </code></pre> All indexing select 1st, 3rd, and 5th rows. This is clear. <pre class="prettyprint"><code>print(y[ [0, 2, 4], :: ]) print(y[ [0, 2, 4], :: ]) print(y[ [True, False, True, False, True], :: ]) --- [[ 0 1 2 3] [ 8 9 10 11] [16 17 18 19]] </code></pre> <h3>Questions</h3> Please help understand what rules or mechanism are working to produce the results. Replacing <code>[]</code> with tuple produces an empty array with shape (0, 5, 4). <pre class="prettyprint"><code>y[ (True, False, True, False, True) ] --- array([], shape=(0, 5, 4), dtype=int64) </code></pre> Use single <code>True</code> adds a new axis. <pre class="prettyprint"><code>y[True] --- array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]]) y[True].shape --- (1, 5, 4) </code></pre> Adding additional boolean True produces the same. <pre class="prettyprint"><code>y[True, True] --- array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]]) y[True, True].shape --- (1, 5, 4) </code></pre> However, adding False boolean causes the empty array again. <pre class="prettyprint"><code>y[True, False] --- array([], shape=(0, 5, 4), dtype=int64) </code></pre> Not sure the documentation explains this behavior. <ul> <li>Boolean array indexing</li> </ul> <blockquote> In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)]. If there is only one Boolean array and no integer indexing array present, this is straight forward. Care must only be taken to make sure that the boolean index has exactly as many dimensions as it is supposed to work with. </blockquote>

Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source: <pre class="prettyprint lang-c prettyprint-override"><code>/* * This can actually be well defined. A new axis is added, * but at the same time no axis is "used". So if we have True, * we add a new axis (a bit like with np.newaxis). If it is * False, we add a new axis, but this axis has 0 entries. */ </code></pre> So if an index is a scalar boolean, a new axis is added. If the value is <code>True</code> the size of the axis is 1, and if the value is False, the size of the axis is zero. This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example: <pre class="prettyprint lang-python prettyprint-override"><code>x = np.ones((2, 2)) assert x[x > 0].ndim == 1 x = np.ones(2) assert x[x > 0].ndim == 1 x = np.ones(()) assert x[x > 0].ndim == 1 # scalar boolean here! </code></pre> The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. <code>HAS_0D_BOOL</code> is treated as <code>HAS_FANCY</code> in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798. Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

Explanation of boolean indexing behaviors

Tags:

python

numpy

boolean-indexing

For the 2D array y:

y = np.arange(20).reshape(5,4)
---
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]

All indexing select 1st, 3rd, and 5th rows. This is clear.

print(y[
    [0, 2, 4],
    ::
])
print(y[
    [0, 2, 4],
    ::
])
print(y[
    [True, False, True, False, True],
    ::
])
---
[[ 0  1  2  3]
 [ 8  9 10 11]
 [16 17 18 19]]

Questions

Please help understand what rules or mechanism are working to produce the results.

Replacing [] with tuple produces an empty array with shape (0, 5, 4).

y[
    (True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)

Use single True adds a new axis.

y[True]
---
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]]])


y[True].shape
---
(1, 5, 4)

Adding additional boolean True produces the same.

y[True, True]
---
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]]])

y[True, True].shape
---
(1, 5, 4)

However, adding False boolean causes the empty array again.

y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)

Not sure the documentation explains this behavior.

Boolean array indexing

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].

If there is only one Boolean array and no integer indexing array present, this is straight forward. Care must only be taken to make sure that the boolean index has exactly as many dimensions as it is supposed to work with.

709

asked Jan 06 '21 04:01

mon

1 Answers

Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:

/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/

So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.

This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:

x = np.ones((2, 2))
assert x[x > 0].ndim == 1

x = np.ones(2)
assert x[x > 0].ndim == 1

x = np.ones(())
assert x[x > 0].ndim == 1  # scalar boolean here!

The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.

Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

185

answered Sep 27 '22 20:09

jakevdp

Related questions
                            
                                __get__ of descriptor __class__ of object class doesn't return as expected
                            
                                It's ok to mix Conda install and Pip install?
                            
                                Python sets versus arrays
                            
                                Hot to fix Tensorflow model not running in Eager mode with .fit()?
                            
                                TF 2.0: Where can I find the upgrade of tf.contrib.training?
                            
                                how to fix "cannot import name 'imresize' error while this function importing from scipy.misc?
                            
                                Tensorflow: create tf.NodeDef() and set attributes
                            
                                Caveats while checking dtype in pandas DataFrame
                            
                                Not able to get real time error in Visual code during python development
                            
                                Why I am getting DatasetV1Adapter return type instead of TensorSliceDataset for tf.data.Dataset.from_tensor_slices(X)
                            
                                Unable to read keystore file from pyspark
                            
                                Correct way to use custom weight maps in unet architecture
                            
                                How can I fix this pytorch error on Windows? (ModuleNotFoundError: No module named 'torch')
                            
                                How to setup a grammar that can handle ambiguity
                            
                                Retrieving text body of answers and comments using Stackexchange API
                            
                                Property Setter for Subclass of Pandas DataFrame
                            
                                Unable to clear pexpect buffer in python3.X
                            
                                Pass function and arguments from node to python, using child_process
                            
                                Why is it that `input_shape` does not include the batch dimension when passed as an argument to the `Dense` layer?
                            
                                How to use the PyTorch Transformer with multi-dimensional sequence-to-seqence?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With