Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird indexing using numpy

Tags:

python

numpy

I have a variable, x, that is of the shape (2,2,50,100).

I also have an array, y, that equals np.array([0,10,20]). A weird thing happens when I index x[0,:,:,y].

x = np.full((2,2,50,100),np.nan)
y = np.array([0,10,20])
print(x.shape)
(2,2,50,100)
print(x[:,:,:,y].shape)
(2,2,50,3)
print(x[0,:,:,:].shape)
(2,50,100)
print(x[0,:,:,y].shape)
(3,2,50)

Why does the last one output (3,2,50) and not (2,50,3)?

like image 350
Paul Scotti Avatar asked Feb 26 '20 21:02

Paul Scotti


People also ask

What is NumPy fancy indexing?

Fancy indexing is conceptually simple: it means passing an array of indices to access multiple array elements at once. For example, consider the following array: import numpy as np rand = np. random. RandomState(42) x = rand.

Can NumPy arrays be indexed?

ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.

Is NumPy indexing fast?

Indexing in NumPy is a reasonably fast operation.

What is negative indexing in NumPy array?

Negative indices are interpreted as counting from the end of the array (i.e., if i < 0, it means n_i + i). All arrays generated by basic slicing are always views of the original array. The standard rules of sequence slicing apply to basic slicing on a per-dimension basis (including using a step index).


2 Answers

This is how numpy uses advanced indexing to broadcast array shapes. When you pass a 0 for the first index, and y for the last index, numpy will broadcast the 0 to be the same shape as y. The following equivalence holds: x[0,:,:,y] == x[(0, 0, 0),:,:,y]. here is an example

import numpy as np

x = np.arange(120).reshape(2,3,4,5)
y = np.array([0,2,4])

np.equal(x[0,:,:,y], x[(0, 0, 0),:,:,y]).all()
# returns:
True

Now, because you are effectively passing in two sets of indices, you are using the advanced indexing API to form (in this case) pairs of indices.

x[(0, 0, 0),:,:,y])

# equivalent to
[
  x[0,:,:,y[0]], 
  x[0,:,:,y[1]], 
  x[0,:,:,y[2]]
]

# equivalent to
rows = np.array([0, 0, 0])
cols = y
x[rows,:,:,cols]

# equivalent to
[
  x[r,:,:,c] for r, c in zip(rows, columns)
]

Which has a first dimension that same as the length of y. This is what you are seeing.

As an example, look at an array with 4 dimensions which are described in the next chunk:

x = np.arange(120).reshape(2,3,4,5)
y = np.array([0,2,4])

# x looks like:
array([[[[  0,   1,   2,   3,   4],    -+      =+
         [  5,   6,   7,   8,   9],     Sheet1  |
         [ 10,  11,  12,  13,  14],     |       |
         [ 15,  16,  17,  18,  19]],   -+       |
                                                Workbook1
        [[ 20,  21,  22,  23,  24],    -+       |
         [ 25,  26,  27,  28,  29],     Sheet2  |
         [ 30,  31,  32,  33,  34],     |       |
         [ 35,  36,  37,  38,  39]],   -+       |
                                                |
        [[ 40,  41,  42,  43,  44],    -+       |
         [ 45,  46,  47,  48,  49],     Sheet3  |
         [ 50,  51,  52,  53,  54],     |       |
         [ 55,  56,  57,  58,  59]]],  -+      =+


       [[[ 60,  61,  62,  63,  64],
         [ 65,  66,  67,  68,  69],
         [ 70,  71,  72,  73,  74],
         [ 75,  76,  77,  78,  79]],

        [[ 80,  81,  82,  83,  84],
         [ 85,  86,  87,  88,  89],
         [ 90,  91,  92,  93,  94],
         [ 95,  96,  97,  98,  99]],

        [[100, 101, 102, 103, 104],
         [105, 106, 107, 108, 109],
         [110, 111, 112, 113, 114],
         [115, 116, 117, 118, 119]]]])

x has a really easy to understand sequential form that we can now use to show what is happening...

The first dimension is like having 2 Excel Workbooks, the second dimension is like having 3 sheets in each workbook, the third dimension is like having 4 rows per sheet, and the last dimension is 5 values for each row (or columns per sheet).

Looking at it this way, asking for x[0,:,:,0], is the saying: "in the first workbook, for each sheet, for each row, give me the first value/column."

x[0,:,:,y[0]]
# returns:
array([[ 0,  5, 10, 15],
       [20, 25, 30, 35],
       [40, 45, 50, 55]])

# this is in the same as the first element in:
x[(0,0,0),:,:,y]

But now with advanced indexing, we can think of x[(0,0,0),:,:,y] as "in the first workbook, for each sheet, for each row, give me the yth value/column. Ok, now do it for each value of y"

x[(0,0,0),:,:,y]
# returns:
array([[[ 0,  5, 10, 15],
        [20, 25, 30, 35],
        [40, 45, 50, 55]],

       [[ 2,  7, 12, 17],
        [22, 27, 32, 37],
        [42, 47, 52, 57]],

       [[ 4,  9, 14, 19],
        [24, 29, 34, 39],
        [44, 49, 54, 59]]])

Where it gets crazy is that numpy will broadcast to match the outer dimensions of index array. So if you want to do that same operation as above, but for BOTH "Excel workbooks", you don't have to loop and concatenate. You can just pass an array to the first dimension, but it MUST have a compatible shape.

Passing an integer gets broadcast to y.shape == (3,). If you want to pass an array as the first index, only the last dimension of the array has to be compatible with y.shape. I.e., the last dimension of the first index must either be 3 or 1.

ix = np.array([[0], [1]])
x[ix,:,:,y].shape
# each row of ix is broadcast to length 3:
(2, 3, 3, 4)

ix = np.array([[0,0,0], [1,1,1]])
x[ix,:,:,y].shape
# this is identical to above:
(2, 3, 3, 4)

ix = np.array([[0], [1], [0], [1], [0]])
x[ix,:,:,y].shape
# ix is broadcast so each row of ix has 3 columns, the length of y
(5, 3, 3, 4)

Found a short explanation in the docs: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing


Edit:

From the original question, to get a one-liner of your desired subslicing, you can use x[0][:,:,y]:

x[0][:,:,y].shape
# returns
(2, 50, 3)

However, if you are trying to assign to those subslices, you have to be very careful that you are looking at a shared memory view of the original array. Otherwise the assignment won't be to the original array, but a copy.

Shared memory only occurs when you are use an integer or slice to subset your array, i.e. x[:,0:3,:,:] or x[0,:,:,1:-1].

np.shares_memory(x, x[0])
# returns:
True

np.shares_memory(x, x[:,:,:,y])
# returns:
False

In both your original question and my example y is neither an int or a slice, so will always end up assigning to a copy of the original.

BUT! Because your array for y can be expressed as a slice, you CAN actually get an assignable view of your array via:

x[0,:,:,0:21:10].shape
# returns:
(2, 50, 3)

np.shares_memory(x, x[0,:,:,0:21:10])
# returns:
True

# actually assigns to the original array
x[0,:,:,0:21:10] = 100

Here we use the slice 0:21:10 to grab every index that would be in range(0,21,10). We have to use 21 and not 20 because the stop-point is excluded from the slice, just like in the range function.

So basically, if you can construct a slice that fits your subslicing criteria, you can do assignment.

like image 154
James Avatar answered Oct 20 '22 06:10

James


It is called combining advanced and basic indexing. In combining advanced and basic indexing, numpy do the indexing in the advanced indexing first and subspace/concatenate the result to the dimension of basic indexing.

Example from docs:

Let x.shape be (10,20,30,40,50) and suppose ind_1 and ind_2 can be broadcast to the shape (2,3,4). Then x[:,ind_1,ind_2] has shape (10,2,3,4,40,50) because the (20,30)-shaped subspace from X has been replaced with the (2,3,4) subspace from the indices. However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired. Note that this example cannot be replicated using take.

so, on x[0,:,:,y], 0 and y are advance indexing. They are broadcast together to yield dimension (3,).

In [239]: np.broadcast(0,y).shape
Out[239]: (3,)

This (3,) tacks to the beginning of 2nd and 3rd dimension to make (3, 2, 50)

To see that the 1st and last dimension are really broadcasting together, you may try change 0 to [0,1] to see the error of broadcasting

print(x[[0,1],:,:,y])

Output:
IndexError                                Traceback (most recent call last)
<ipython-input-232-5d10156346f5> in <module>
----> 1 x[[0,1],:,:,y]

IndexError: shape mismatch: indexing arrays could not be broadcast together with
 shapes (2,) (3,)
like image 33
Andy L. Avatar answered Oct 20 '22 06:10

Andy L.