Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does numpy order array slice indices?

I have an np.array data of shape (28,8,20), and I only need certain entries from it, so I'm taking a slice:

In [41]: index = np.array([ 5,  6,  7,  8,  9, 10, 11, 17, 18, 19])
In [42]: extract = data[:,:,index]
In [43]: extract.shape
Out[43]: (28, 8, 10)

So far so good, everything as it should be. But now I wand to look at just the first two entries on the last index for the first line:

In [45]: extract[0,:,np.array([0,1])].shape
Out[45]: (2, 8)

Wait, that should be (8,2). It switched the indices around, even though it did not when I sliced the last time! According to my understanding, the following should act the same way:

In [46]: extract[0,:,:2].shape
Out[46]: (8, 2)

... but it gives me exactly what I wanted! As long as I have a 3D-array, though, both methods seem to be equivalent:

In [47]: extract[:,:,np.array([0,1])].shape
Out[47]: (28, 8, 2)

In [48]: extract[:,:,:2].shape
Out[48]: (28, 8, 2)

So what do I do if I want not just the first two entries but an irregular list? I could of course transpose the matrix after the operation but this seems very counter-intuitive. A better solution to my problem is this (though there might be a more elegant one):

In [64]: extract[0][:,[0,1]].shape
Out[64]: (8, 2)

Which brings us to the actual

question:

I wonder what the reason for this behaviour is? Whoever decided that this is how it should work probably knew more about programming than I do and thought that this is consistent in some way that I am entirely missing. And I will likely keep hitting my head on this unless I have a way to make sense of it.

like image 936
Zak Avatar asked Nov 23 '14 21:11

Zak


2 Answers

This is a case of (advanced) partial indexing. There are 2 indexed arrays, and 1 slice

If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.

http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing

The advanced indexing example notes, when the ind_1, ind_2 broadcastable subspace is shape (2,3,4) that:

However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired.

In other words, this indexing is not the same as x[:, ind_1][[:,ind_2]. The 2 arrays operate jointly to define a (2,3,4) subspace.

In your example, extract[0,:,np.array([0,1])] is understood to mean, select a (2,) subspace (the [0] and [0,1] act jointly, not sequentially), and combine that in some way with the middle dimension.

A more elaborate example would be extract[[1,0],:,[[0,1],[1,0]]], which produces a (2,2,8) array. This is a (2,2) subspace of the 1st and last dimensions, plus the middle one. On the other hand, X[[1,0]][:,:,[[0,1],[1,0]]] produces a (2,8,2,2), selecting from the 1st and last dimensions separately.

The key difference is whether the indexed selections operate sequential or jointly. The `[...][...] syntax is already available to operate sequentially. Advanced indexing gives you a way indexing jointly.

like image 85
hpaulj Avatar answered Oct 12 '22 22:10

hpaulj


You're right, that's weird. I can only hazard a guess here. I think it's related to the fact that a[[0,1],[0,1],[0,1]].shape is (2,) rather than (2,2,2) and that a[0,1,[0,1,2]] really means a[[0,0,0],[1,1,1],[0,1,2]] which evaluates to array([a[0,1,0],a[0,1,1],a[0,1,2]]). That is, you step through lists-as-indices for each dimension in parallel, with length-one lists and scalars being broadcast to match the longest.

Conceptually, that would make your extract[0,:,[0,1]] equivalent to extract[[0,0],[slice(None),slice(None)],[0,1]] (that syntax isn't accepted if you specify it manually, though). After stepping through the indices, that would evaluate to array([extract[0,slice(None),0],extract[0,slice(None),1]). Each of the inner extracts evaluate to a shape (8,) array, so the full result is shape (2,8).

So to conclude I think it is a side-effect of the broadcasting that is done to make all the dimensions have an index list of the same length, which leads to : being broadcast too. That is my hypothesis, but I haven't looked at the inner workings of how numpy does this. Perhaps an expert will come along with a better explanation.

This hypothesis does not explain why extract[:,:,[0,1]] does not result in the same behavior. I would have to postulate that the case of only leading ":" being special-cased to avoid participating in the list index logic.

like image 20
amaurea Avatar answered Oct 12 '22 22:10

amaurea