Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing with lists and arrays in numpy appears inconsistent

Inspired by this other question, I'm trying to wrap my mind around advanced indexing in NumPy and build up more intuitive understanding of how it works.

I've found an interesting case. Here's an array:

>>> y = np.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

if I index it a scalar, I get a scalar of course:

>>> y[4]
4

with a 1D array of integers, I get another 1D array:

>>> idx = [4, 3, 2, 1]
>>> y[idx]
array([4, 3, 2, 1])

so if I index it with a 2D array of integers, I get... what do I get?

>>> idx = [[4, 3], [2, 1]]
>>> y[idx]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array

Oh no! The symmetry is broken. I have to index with a 3D array to get a 2D array!

>>> idx = [[[4, 3], [2, 1]]]
>>> y[idx]
array([[4, 3],
       [2, 1]])

What makes numpy behave this way?

To make this more interesting, I noticed that indexing with numpy arrays (instead of lists) behaves how I'd intuitively expect, and 2D gives me 2D:

>>> idx = np.array([[4, 3], [2, 1]])
>>> y[idx]
array([[4, 3],
       [2, 1]])

This looks inconsistent from where I'm at. What's the rule here?

like image 767
Kos Avatar asked Sep 08 '17 12:09

Kos


People also ask

Can NumPy arrays be indexed?

Indexing can be done in numpy by using an array as an index. In case of slice, a view or shallow copy of the array is returned but in index array a copy of the original array is returned. Numpy arrays can be indexed with other arrays or any other sequence with the exception of tuples.

Are NumPy arrays more efficient than lists?

NumPy Arrays Are Faster Than Lists.

Are NumPy indices inclusive?

Slice a Range of Values from Two-dimensional Numpy Arrays Recall that the index structure for both the row and column range is inclusive of the first index, but not the second index. For example, you can use the index [0:1, 0:2] to select the elements in first row, first two columns.


1 Answers

The reason is the interpretation of lists as index for numpy arrays: Lists are interpreted like tuples and indexing with a tuple is interpreted by NumPy as multidimensional indexing.

Just like arr[1, 2] returns the element arr[1][2] the arr[[[4, 3], [2, 1]]] is identical to arr[[4, 3], [2, 1]] and will, according to the rules of multidimensional indexing return the elements arr[4, 2] and arr[3, 1].

By adding one more list you do tell NumPy that you want slicing along the first dimension, because the outermost list is effectively interpreted as if you only passed in one "list of indices for the first dimension": arr[[[[4, 3], [2, 1]]]].

From the documentation:

Example

From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:

>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])

and:

Warning

The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.

In such cases it's probably better to use np.take:

>>> y.take([[4, 3], [2, 1]])  # 2D array
array([[4, 3],
       [2, 1]])

This function [np.take] does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis.

Or convert the indices to an array. That way NumPy interprets it (array is special cased!) as fancy indexing instead of as "multidimensional indexing":

>>> y[np.asarray([[4, 3], [2, 1]])]
array([[4, 3],
       [2, 1]])
like image 55
MSeifert Avatar answered Sep 27 '22 23:09

MSeifert