I have an np.array data
of shape (28,8,20), and I only need certain entries from it, so I'm taking a slice:
In [41]: index = np.array([ 5, 6, 7, 8, 9, 10, 11, 17, 18, 19])
In [42]: extract = data[:,:,index]
In [43]: extract.shape
Out[43]: (28, 8, 10)
So far so good, everything as it should be. But now I wand to look at just the first two entries on the last index for the first line:
In [45]: extract[0,:,np.array([0,1])].shape
Out[45]: (2, 8)
Wait, that should be (8,2). It switched the indices around, even though it did not when I sliced the last time! According to my understanding, the following should act the same way:
In [46]: extract[0,:,:2].shape
Out[46]: (8, 2)
... but it gives me exactly what I wanted! As long as I have a 3D-array, though, both methods seem to be equivalent:
In [47]: extract[:,:,np.array([0,1])].shape
Out[47]: (28, 8, 2)
In [48]: extract[:,:,:2].shape
Out[48]: (28, 8, 2)
So what do I do if I want not just the first two entries but an irregular list? I could of course transpose the matrix after the operation but this seems very counter-intuitive. A better solution to my problem is this (though there might be a more elegant one):
In [64]: extract[0][:,[0,1]].shape
Out[64]: (8, 2)
Which brings us to the actual
I wonder what the reason for this behaviour is? Whoever decided that this is how it should work probably knew more about programming than I do and thought that this is consistent in some way that I am entirely missing. And I will likely keep hitting my head on this unless I have a way to make sense of it.
This is a case of (advanced) partial indexing. There are 2 indexed arrays, and 1 slice
If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.
http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing
The advanced indexing example notes, when the ind_1
, ind_2
broadcastable subspace is shape (2,3,4)
that:
However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired.
In other words, this indexing is not the same as x[:, ind_1][[:,ind_2]
. The 2 arrays operate jointly to define a (2,3,4)
subspace.
In your example, extract[0,:,np.array([0,1])]
is understood to mean, select a (2,)
subspace (the [0] and [0,1] act jointly, not sequentially), and combine that in some way with the middle dimension.
A more elaborate example would be extract[[1,0],:,[[0,1],[1,0]]]
, which produces a (2,2,8)
array. This is a (2,2)
subspace of the 1st and last dimensions, plus the middle one. On the other hand, X[[1,0]][:,:,[[0,1],[1,0]]]
produces a (2,8,2,2)
, selecting from the 1st and last dimensions separately.
The key difference is whether the indexed selections operate sequential or jointly. The `[...][...] syntax is already available to operate sequentially. Advanced indexing gives you a way indexing jointly.
You're right, that's weird. I can only hazard a guess here. I think it's related to the fact that a[[0,1],[0,1],[0,1]].shape
is (2,)
rather than (2,2,2)
and that a[0,1,[0,1,2]]
really means a[[0,0,0],[1,1,1],[0,1,2]]
which evaluates to array([a[0,1,0],a[0,1,1],a[0,1,2]])
. That is, you step through lists-as-indices for each dimension in parallel, with length-one lists and scalars being broadcast to match the longest.
Conceptually, that would make your extract[0,:,[0,1]]
equivalent to extract[[0,0],[slice(None),slice(None)],[0,1]]
(that syntax isn't accepted if you specify it manually, though). After stepping through the indices, that would evaluate to array([extract[0,slice(None),0],extract[0,slice(None),1])
. Each of the inner extracts evaluate to a shape (8,)
array, so the full result is shape (2,8)
.
So to conclude I think it is a side-effect of the broadcasting that is done to make all the dimensions have an index list of the same length, which leads to :
being broadcast too. That is my hypothesis, but I haven't looked at the inner workings of how numpy
does this. Perhaps an expert will come along with a better explanation.
This hypothesis does not explain why extract[:,:,[0,1]]
does not result in the same behavior. I would have to postulate that the case of only leading ":" being special-cased to avoid participating in the list index logic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With