Inspired by this other question, I'm trying to wrap my mind around advanced indexing in NumPy and build up more intuitive understanding of how it works. I've found an interesting case. Here's an array: <pre class="prettyprint"><code>>>> y = np.arange(10) >>> y array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) </code></pre> if I index it a scalar, I get a scalar of course: <pre class="prettyprint"><code>>>> y[4] 4 </code></pre> with a 1D array of integers, I get another 1D array: <pre class="prettyprint"><code>>>> idx = [4, 3, 2, 1] >>> y[idx] array([4, 3, 2, 1]) </code></pre> so if I index it with a 2D array of integers, I get... what do I get? <pre class="prettyprint"><code>>>> idx = [[4, 3], [2, 1]] >>> y[idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: too many indices for array </code></pre> Oh no! The symmetry is broken. I have to index with a 3D array to get a 2D array! <pre class="prettyprint"><code>>>> idx = [[[4, 3], [2, 1]]] >>> y[idx] array([[4, 3], [2, 1]]) </code></pre> What makes numpy behave this way? To make this more interesting, I noticed that indexing with numpy arrays (instead of lists) behaves how I'd intuitively expect, and 2D gives me 2D: <pre class="prettyprint"><code>>>> idx = np.array([[4, 3], [2, 1]]) >>> y[idx] array([[4, 3], [2, 1]]) </code></pre> This looks inconsistent from where I'm at. What's the rule here?

The reason is the interpretation of lists as index for numpy arrays: Lists are interpreted like tuples and indexing with a tuple is interpreted by NumPy as multidimensional indexing. Just like <code>arr[1, 2]</code> returns the element <code>arr[1][2]</code> the <code>arr[[[4, 3], [2, 1]]]</code> is identical to <code>arr[[4, 3], [2, 1]]</code> and will, according to the rules of multidimensional indexing return the elements <code>arr[4, 2]</code> and <code>arr[3, 1]</code>. By adding one more list you do tell NumPy that you want slicing along the first dimension, because the outermost list is effectively interpreted as if you only passed in one "list of indices for the first dimension": <code>arr[[[[4, 3], [2, 1]]]]</code>. From the documentation: <blockquote> <h3>Example</h3> From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing: <pre class="prettyprint"><code>>>> x = np.array([[1, 2], [3, 4], [5, 6]]) >>> x[[0, 1, 2], [0, 1, 0]] array([1, 4, 5]) </code></pre> </blockquote> and: <blockquote> <h3>Warning</h3> The definition of advanced indexing means that <code>x[(1,2,3),]</code> is fundamentally different than <code>x[(1,2,3)]</code>. The latter is equivalent to <code>x[1,2,3]</code> which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs. </blockquote> In such cases it's probably better to use <code>np.take</code>: <pre class="prettyprint"><code>>>> y.take([[4, 3], [2, 1]]) # 2D array array([[4, 3], [2, 1]]) </code></pre> <blockquote> This function [<code>np.take</code>] does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis. </blockquote> Or convert the indices to an array. That way NumPy interprets it (<code>array</code> is special cased!) as fancy indexing instead of as "multidimensional indexing": <pre class="prettyprint"><code>>>> y[np.asarray([[4, 3], [2, 1]])] array([[4, 3], [2, 1]]) </code></pre>

Indexing with lists and arrays in numpy appears inconsistent

Inspired by this other question, I'm trying to wrap my mind around advanced indexing in NumPy and build up more intuitive understanding of how it works.

I've found an interesting case. Here's an array:

>>> y = np.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

if I index it a scalar, I get a scalar of course:

>>> y[4]
4

with a 1D array of integers, I get another 1D array:

>>> idx = [4, 3, 2, 1]
>>> y[idx]
array([4, 3, 2, 1])

so if I index it with a 2D array of integers, I get... what do I get?

>>> idx = [[4, 3], [2, 1]]
>>> y[idx]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array

Oh no! The symmetry is broken. I have to index with a 3D array to get a 2D array!

>>> idx = [[[4, 3], [2, 1]]]
>>> y[idx]
array([[4, 3],
       [2, 1]])

What makes numpy behave this way?

To make this more interesting, I noticed that indexing with numpy arrays (instead of lists) behaves how I'd intuitively expect, and 2D gives me 2D:

>>> idx = np.array([[4, 3], [2, 1]])
>>> y[idx]
array([[4, 3],
       [2, 1]])

This looks inconsistent from where I'm at. What's the rule here?

Can NumPy arrays be indexed?

Indexing can be done in numpy by using an array as an index. In case of slice, a view or shallow copy of the array is returned but in index array a copy of the original array is returned. Numpy arrays can be indexed with other arrays or any other sequence with the exception of tuples.

Are NumPy arrays more efficient than lists?

NumPy Arrays Are Faster Than Lists.

Are NumPy indices inclusive?

Slice a Range of Values from Two-dimensional Numpy Arrays Recall that the index structure for both the row and column range is inclusive of the first index, but not the second index. For example, you can use the index [0:1, 0:2] to select the elements in first row, first two columns.

The reason is the interpretation of lists as index for numpy arrays: Lists are interpreted like tuples and indexing with a tuple is interpreted by NumPy as multidimensional indexing.

Just like arr[1, 2] returns the element arr[1][2] the arr[[[4, 3], [2, 1]]] is identical to arr[[4, 3], [2, 1]] and will, according to the rules of multidimensional indexing return the elements arr[4, 2] and arr[3, 1].

By adding one more list you do tell NumPy that you want slicing along the first dimension, because the outermost list is effectively interpreted as if you only passed in one "list of indices for the first dimension": arr[[[[4, 3], [2, 1]]]].

From the documentation:

Example

From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])

and:

Warning

The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.

In such cases it's probably better to use np.take:

>>> y.take([[4, 3], [2, 1]])  # 2D array
array([[4, 3],
       [2, 1]])

This function [np.take] does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis.

Or convert the indices to an array. That way NumPy interprets it (array is special cased!) as fancy indexing instead of as "multidimensional indexing":

>>> y[np.asarray([[4, 3], [2, 1]])]
array([[4, 3],
       [2, 1]])

Indexing with lists and arrays in numpy appears inconsistent

Tags:

python

indexing

numpy

Kos

People also ask

1 Answers

Example

Warning

MSeifert

Recent Activity

Donate For Us

Indexing with lists and arrays in numpy appears inconsistent

Tags:

python

indexing

numpy

Kos

People also ask

1 Answers

Example

Warning

MSeifert

Related questions

Recent Activity

Donate For Us