I have a variable, x, that is of the shape (2,2,50,100). I also have an array, y, that equals np.array([0,10,20]). A weird thing happens when I index x[0,:,:,y]. <pre class="prettyprint"><code>x = np.full((2,2,50,100),np.nan) y = np.array([0,10,20]) print(x.shape) (2,2,50,100) print(x[:,:,:,y].shape) (2,2,50,3) print(x[0,:,:,:].shape) (2,50,100) print(x[0,:,:,y].shape) (3,2,50) </code></pre> Why does the last one output (3,2,50) and not (2,50,3)?

This is how numpy uses advanced indexing to broadcast array shapes. When you pass a <code>0</code> for the first index, and <code>y</code> for the last index, numpy will broadcast the <code>0</code> to be the same shape as <code>y</code>. The following equivalence holds: <code>x[0,:,:,y] == x[(0, 0, 0),:,:,y]</code>. here is an example <pre class="prettyprint"><code>import numpy as np x = np.arange(120).reshape(2,3,4,5) y = np.array([0,2,4]) np.equal(x[0,:,:,y], x[(0, 0, 0),:,:,y]).all() # returns: True </code></pre> Now, because you are effectively passing in two sets of indices, you are using the advanced indexing API to form (in this case) pairs of indices. <pre class="prettyprint"><code>x[(0, 0, 0),:,:,y]) # equivalent to [ x[0,:,:,y[0]], x[0,:,:,y[1]], x[0,:,:,y[2]] ] # equivalent to rows = np.array([0, 0, 0]) cols = y x[rows,:,:,cols] # equivalent to [ x[r,:,:,c] for r, c in zip(rows, columns) ] </code></pre> Which has a first dimension that same as the length of <code>y</code>. This is what you are seeing. As an example, look at an array with 4 dimensions which are described in the next chunk: <pre class="prettyprint"><code>x = np.arange(120).reshape(2,3,4,5) y = np.array([0,2,4]) # x looks like: array([[[[ 0, 1, 2, 3, 4], -+ =+ [ 5, 6, 7, 8, 9], Sheet1 | [ 10, 11, 12, 13, 14], | | [ 15, 16, 17, 18, 19]], -+ | Workbook1 [[ 20, 21, 22, 23, 24], -+ | [ 25, 26, 27, 28, 29], Sheet2 | [ 30, 31, 32, 33, 34], | | [ 35, 36, 37, 38, 39]], -+ | | [[ 40, 41, 42, 43, 44], -+ | [ 45, 46, 47, 48, 49], Sheet3 | [ 50, 51, 52, 53, 54], | | [ 55, 56, 57, 58, 59]]], -+ =+ [[[ 60, 61, 62, 63, 64], [ 65, 66, 67, 68, 69], [ 70, 71, 72, 73, 74], [ 75, 76, 77, 78, 79]], [[ 80, 81, 82, 83, 84], [ 85, 86, 87, 88, 89], [ 90, 91, 92, 93, 94], [ 95, 96, 97, 98, 99]], [[100, 101, 102, 103, 104], [105, 106, 107, 108, 109], [110, 111, 112, 113, 114], [115, 116, 117, 118, 119]]]]) </code></pre> <code>x</code> has a really easy to understand sequential form that we can now use to show what is happening... The first dimension is like having 2 Excel Workbooks, the second dimension is like having 3 sheets in each workbook, the third dimension is like having 4 rows per sheet, and the last dimension is 5 values for each row (or columns per sheet). Looking at it this way, asking for <code>x[0,:,:,0]</code>, is the saying: "in the first workbook, for each sheet, for each row, give me the first value/column." <pre class="prettyprint"><code>x[0,:,:,y[0]] # returns: array([[ 0, 5, 10, 15], [20, 25, 30, 35], [40, 45, 50, 55]]) # this is in the same as the first element in: x[(0,0,0),:,:,y] </code></pre> But now with advanced indexing, we can think of <code>x[(0,0,0),:,:,y]</code> as "in the first workbook, for each sheet, for each row, give me the <code>y</code>th value/column. Ok, now do it for each value of <code>y</code>" <pre class="prettyprint"><code>x[(0,0,0),:,:,y] # returns: array([[[ 0, 5, 10, 15], [20, 25, 30, 35], [40, 45, 50, 55]], [[ 2, 7, 12, 17], [22, 27, 32, 37], [42, 47, 52, 57]], [[ 4, 9, 14, 19], [24, 29, 34, 39], [44, 49, 54, 59]]]) </code></pre> Where it gets crazy is that numpy will broadcast to match the outer dimensions of index array. So if you want to do that same operation as above, but for BOTH "Excel workbooks", you don't have to loop and concatenate. You can just pass an array to the first dimension, but it MUST have a compatible shape. Passing an integer gets broadcast to <code>y.shape == (3,)</code>. If you want to pass an array as the first index, only the last dimension of the array has to be compatible with <code>y.shape</code>. I.e., the last dimension of the first index must either be 3 or 1. <pre class="prettyprint lang-py prettyprint-override"><code>ix = np.array([[0], [1]]) x[ix,:,:,y].shape # each row of ix is broadcast to length 3: (2, 3, 3, 4) ix = np.array([[0,0,0], [1,1,1]]) x[ix,:,:,y].shape # this is identical to above: (2, 3, 3, 4) ix = np.array([[0], [1], [0], [1], [0]]) x[ix,:,:,y].shape # ix is broadcast so each row of ix has 3 columns, the length of y (5, 3, 3, 4) </code></pre> Found a short explanation in the docs: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing <hr> <h3>Edit:</h3> From the original question, to get a one-liner of your desired subslicing, you can use <code>x[0][:,:,y]</code>: <pre class="prettyprint"><code>x[0][:,:,y].shape # returns (2, 50, 3) </code></pre> However, if you are trying to assign to those subslices, you have to be very careful that you are looking at a shared memory view of the original array. Otherwise the assignment won't be to the original array, but a copy. Shared memory only occurs when you are use an integer or slice to subset your array, i.e. <code>x[:,0:3,:,:]</code> or <code>x[0,:,:,1:-1]</code>. <pre class="prettyprint lang-py prettyprint-override"><code>np.shares_memory(x, x[0]) # returns: True np.shares_memory(x, x[:,:,:,y]) # returns: False </code></pre> In both your original question and my example <code>y</code> is neither an int or a slice, so will always end up assigning to a copy of the original. BUT! Because your array for <code>y</code> can be expressed as a slice, you CAN actually get an assignable view of your array via: <pre class="prettyprint lang-py prettyprint-override"><code>x[0,:,:,0:21:10].shape # returns: (2, 50, 3) np.shares_memory(x, x[0,:,:,0:21:10]) # returns: True # actually assigns to the original array x[0,:,:,0:21:10] = 100 </code></pre> Here we use the slice <code>0:21:10</code> to grab every index that would be in <code>range(0,21,10)</code>. We have to use <code>21</code> and not <code>20</code> because the stop-point is excluded from the slice, just like in the <code>range</code> function. So basically, if you can construct a slice that fits your subslicing criteria, you can do assignment.

Weird indexing using numpy

Tags:

python

numpy

I have a variable, x, that is of the shape (2,2,50,100).

I also have an array, y, that equals np.array([0,10,20]). A weird thing happens when I index x[0,:,:,y].

Click to copy

x = np.full((2,2,50,100),np.nan)
y = np.array([0,10,20])
print(x.shape)
(2,2,50,100)
print(x[:,:,:,y].shape)
(2,2,50,3)
print(x[0,:,:,:].shape)
(2,50,100)
print(x[0,:,:,y].shape)
(3,2,50)

Why does the last one output (3,2,50) and not (2,50,3)?

350

asked Feb 26 '20 21:02

Paul Scotti

2 Answers

This is how numpy uses advanced indexing to broadcast array shapes. When you pass a 0 for the first index, and y for the last index, numpy will broadcast the 0 to be the same shape as y. The following equivalence holds: x[0,:,:,y] == x[(0, 0, 0),:,:,y]. here is an example

Click to copy

import numpy as np

x = np.arange(120).reshape(2,3,4,5)
y = np.array([0,2,4])

np.equal(x[0,:,:,y], x[(0, 0, 0),:,:,y]).all()
# returns:
True

Now, because you are effectively passing in two sets of indices, you are using the advanced indexing API to form (in this case) pairs of indices.

Click to copy

x[(0, 0, 0),:,:,y])

# equivalent to
[
  x[0,:,:,y[0]], 
  x[0,:,:,y[1]], 
  x[0,:,:,y[2]]
]

# equivalent to
rows = np.array([0, 0, 0])
cols = y
x[rows,:,:,cols]

# equivalent to
[
  x[r,:,:,c] for r, c in zip(rows, columns)
]

Which has a first dimension that same as the length of y. This is what you are seeing.

As an example, look at an array with 4 dimensions which are described in the next chunk:

Click to copy

x = np.arange(120).reshape(2,3,4,5)
y = np.array([0,2,4])

# x looks like:
array([[[[  0,   1,   2,   3,   4],    -+      =+
         [  5,   6,   7,   8,   9],     Sheet1  |
         [ 10,  11,  12,  13,  14],     |       |
         [ 15,  16,  17,  18,  19]],   -+       |
                                                Workbook1
        [[ 20,  21,  22,  23,  24],    -+       |
         [ 25,  26,  27,  28,  29],     Sheet2  |
         [ 30,  31,  32,  33,  34],     |       |
         [ 35,  36,  37,  38,  39]],   -+       |
                                                |
        [[ 40,  41,  42,  43,  44],    -+       |
         [ 45,  46,  47,  48,  49],     Sheet3  |
         [ 50,  51,  52,  53,  54],     |       |
         [ 55,  56,  57,  58,  59]]],  -+      =+


       [[[ 60,  61,  62,  63,  64],
         [ 65,  66,  67,  68,  69],
         [ 70,  71,  72,  73,  74],
         [ 75,  76,  77,  78,  79]],

        [[ 80,  81,  82,  83,  84],
         [ 85,  86,  87,  88,  89],
         [ 90,  91,  92,  93,  94],
         [ 95,  96,  97,  98,  99]],

        [[100, 101, 102, 103, 104],
         [105, 106, 107, 108, 109],
         [110, 111, 112, 113, 114],
         [115, 116, 117, 118, 119]]]])

x has a really easy to understand sequential form that we can now use to show what is happening...

The first dimension is like having 2 Excel Workbooks, the second dimension is like having 3 sheets in each workbook, the third dimension is like having 4 rows per sheet, and the last dimension is 5 values for each row (or columns per sheet).

Looking at it this way, asking for x[0,:,:,0], is the saying: "in the first workbook, for each sheet, for each row, give me the first value/column."

Click to copy

x[0,:,:,y[0]]
# returns:
array([[ 0,  5, 10, 15],
       [20, 25, 30, 35],
       [40, 45, 50, 55]])

# this is in the same as the first element in:
x[(0,0,0),:,:,y]

But now with advanced indexing, we can think of x[(0,0,0),:,:,y] as "in the first workbook, for each sheet, for each row, give me the yth value/column. Ok, now do it for each value of y"

Click to copy

x[(0,0,0),:,:,y]
# returns:
array([[[ 0,  5, 10, 15],
        [20, 25, 30, 35],
        [40, 45, 50, 55]],

       [[ 2,  7, 12, 17],
        [22, 27, 32, 37],
        [42, 47, 52, 57]],

       [[ 4,  9, 14, 19],
        [24, 29, 34, 39],
        [44, 49, 54, 59]]])

Where it gets crazy is that numpy will broadcast to match the outer dimensions of index array. So if you want to do that same operation as above, but for BOTH "Excel workbooks", you don't have to loop and concatenate. You can just pass an array to the first dimension, but it MUST have a compatible shape.

Passing an integer gets broadcast to y.shape == (3,). If you want to pass an array as the first index, only the last dimension of the array has to be compatible with y.shape. I.e., the last dimension of the first index must either be 3 or 1.

Click to copy

ix = np.array([[0], [1]])
x[ix,:,:,y].shape
# each row of ix is broadcast to length 3:
(2, 3, 3, 4)

ix = np.array([[0,0,0], [1,1,1]])
x[ix,:,:,y].shape
# this is identical to above:
(2, 3, 3, 4)

ix = np.array([[0], [1], [0], [1], [0]])
x[ix,:,:,y].shape
# ix is broadcast so each row of ix has 3 columns, the length of y
(5, 3, 3, 4)

Found a short explanation in the docs: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing

Edit:

From the original question, to get a one-liner of your desired subslicing, you can use x[0][:,:,y]:

Click to copy

x[0][:,:,y].shape
# returns
(2, 50, 3)

However, if you are trying to assign to those subslices, you have to be very careful that you are looking at a shared memory view of the original array. Otherwise the assignment won't be to the original array, but a copy.

Shared memory only occurs when you are use an integer or slice to subset your array, i.e. x[:,0:3,:,:] or x[0,:,:,1:-1].

Click to copy

np.shares_memory(x, x[0])
# returns:
True

np.shares_memory(x, x[:,:,:,y])
# returns:
False

In both your original question and my example y is neither an int or a slice, so will always end up assigning to a copy of the original.

BUT! Because your array for y can be expressed as a slice, you CAN actually get an assignable view of your array via:

Click to copy

x[0,:,:,0:21:10].shape
# returns:
(2, 50, 3)

np.shares_memory(x, x[0,:,:,0:21:10])
# returns:
True

# actually assigns to the original array
x[0,:,:,0:21:10] = 100

Here we use the slice 0:21:10 to grab every index that would be in range(0,21,10). We have to use 21 and not 20 because the stop-point is excluded from the slice, just like in the range function.

So basically, if you can construct a slice that fits your subslicing criteria, you can do assignment.

154

answered Oct 20 '22 06:10

James

It is called combining advanced and basic indexing. In combining advanced and basic indexing, numpy do the indexing in the advanced indexing first and subspace/concatenate the result to the dimension of basic indexing.

Example from docs:

Let x.shape be (10,20,30,40,50) and suppose ind_1 and ind_2 can be broadcast to the shape (2,3,4). Then x[:,ind_1,ind_2] has shape (10,2,3,4,40,50) because the (20,30)-shaped subspace from X has been replaced with the (2,3,4) subspace from the indices. However, x[:,ind_1,:,ind_2] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. It is always possible to use .transpose() to move the subspace anywhere desired. Note that this example cannot be replicated using take.

so, on x[0,:,:,y], 0 and y are advance indexing. They are broadcast together to yield dimension (3,).

Click to copy

In [239]: np.broadcast(0,y).shape
Out[239]: (3,)

This (3,) tacks to the beginning of 2nd and 3rd dimension to make (3, 2, 50)

To see that the 1st and last dimension are really broadcasting together, you may try change 0 to [0,1] to see the error of broadcasting

Click to copy

print(x[[0,1],:,:,y])

Output:
IndexError                                Traceback (most recent call last)
<ipython-input-232-5d10156346f5> in <module>
----> 1 x[[0,1],:,:,y]

IndexError: shape mismatch: indexing arrays could not be broadcast together with
 shapes (2,) (3,)

answered Oct 20 '22 06:10

Andy L.

Related questions
                            
                                Merging a list of time-range tuples that have overlapping time-ranges
                            
                                Equivalent of R/ifelse in Python/Pandas? Compare string columns?
                            
                                Does TensorFlow have cross validation implemented for its users?
                            
                                AWS lambda - Release /tmp storage after each execution
                            
                                Preventing namespace collisions between private and pypi-based Python packages
                            
                                Use %20 instead of + for space in python query parameters
                            
                                How to close Boto S3 connection?
                            
                                How is the TFIDFVectorizer in scikit-learn supposed to work?
                            
                                Mypy/typeshed stubs for Pandas
                            
                                Twisted + SQLAlchemy and the best way to do it
                            
                                How to debug python CLI that takes stdin?
                            
                                Capturing output of python script run inside a docker container
                            
                                Why can't matplotlib plot in a different thread?
                            
                                Displaying graphs/charts in Django [closed]
                            
                                General approach to developing an image classification algorithm for Dilbert cartoons
                            
                                Python ConfigParser: Checking for option existence
                            
                                Change Matplotlib's default font
                            
                                Pandas' merge returns a column with _x appended to the name
                            
                                TensorFlow - numpy-like tensor indexing
                            
                                Python __enter__ / __exit__ vs __init__ (or __new__) / __del__

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weird indexing using numpy

Tags:

python

numpy

Paul Scotti

People also ask

2 Answers

Edit:

James

Andy L.

Recent Activity

Donate For Us