I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet. I have a numpy array, and for the sake of argument, let it be defined as follows: <pre class="prettyprint"><code>import numpy as np a = np.arange(100) a.shape = (10,10) # array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], # [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], # [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], # [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], # [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], # [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], # [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], # [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], # [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], # [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]) </code></pre> now I want to choose rows and columns of <code>a</code> specified by vectors <code>n1</code> and <code>n2</code>. As an example: <pre class="prettyprint"><code>n1 = range(5) n2 = range(5) </code></pre> But when I use: <pre class="prettyprint"><code>b = a[n1,n2] # array([ 0, 11, 22, 33, 44]) </code></pre> Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this: <pre class="prettyprint"><code>b = a[n1,:] b = b[:,n2] # array([[ 0, 1, 2, 3, 4], # [10, 11, 12, 13, 14], # [20, 21, 22, 23, 24], # [30, 31, 32, 33, 34], # [40, 41, 42, 43, 44]]) </code></pre> But I am sure there should be a way to do this simple task in just one command.

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future. There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies. In your case: <pre class="prettyprint"><code>import numpy as np a = np.arange(100).reshape(10,10) n1, n2 = np.arange(5), np.arange(5) # Not what you want b = a[n1, n2] # array([ 0, 11, 22, 33, 44]) # What you want, but only for simple sequences # Note that no copy of *a* is made!! This is a view. b = a[:5, :5] # What you want, but probably confusing at first. (Also, makes a copy.) # np.meshgrid and np.ix_ are basically equivalent to this. b = a[n1[:,None], n2[None,:]] </code></pre> <hr> Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result. <pre class="prettyprint"><code>print "Fancy Indexing:" print a[n1, n2] print "Manual indexing:" for i, j in zip(n1, n2): print a[i, j] </code></pre> <hr> However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask. In other words, <code>a[[[1, 2, 3]], [[1],[2],[3]]]</code> is treated completely differently than <code>a[[1, 2, 3], [1, 2, 3]]</code>, because the sequences/arrays that you're passing in are two-dimensional. <pre class="prettyprint"><code>In [4]: a[[[1, 2, 3]], [[1],[2],[3]]] Out[4]: array([[11, 21, 31], [12, 22, 32], [13, 23, 33]]) In [5]: a[[1, 2, 3], [1, 2, 3]] Out[5]: array([11, 22, 33]) </code></pre> <hr> To be a bit more precise, <pre class="prettyprint"><code>a[[[1, 2, 3]], [[1],[2],[3]]] </code></pre> is treated exactly like: <pre class="prettyprint"><code>i = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]) j = [[1, 2, 3], [1, 2, 3], [1, 2, 3]] a[i, j] </code></pre> In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing. <hr> <code>np.meshgrid</code> and <code>np.ix_</code> are just convienent ways to turn your 1D sequences into their 2D versions for indexing: <pre class="prettyprint"><code>In [6]: np.ix_([1, 2, 3], [1, 2, 3]) Out[6]: (array([[1], [2], [3]]), array([[1, 2, 3]])) </code></pre> Similarly (the <code>sparse</code> argument would make it identical to <code>ix_</code> above): <pre class="prettyprint"><code>In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij') Out[7]: [array([[1, 1, 1], [2, 2, 2], [3, 3, 3]]), array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])] </code></pre>

Another quick way to build the desired index is to use the <code>np.ix_</code> function: <pre class="prettyprint"><code>>>> a[np.ix_(n1, n2)] array([[ 0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]]) </code></pre> This provides a convenient way to construct an open mesh from sequences of indices.

Subsetting a 2D numpy array

Tags:

python

arrays

multidimensional-array

numpy

subset

I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet.

I have a numpy array, and for the sake of argument, let it be defined as follows:

import numpy as np a = np.arange(100) a.shape = (10,10) # array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9], #        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], #        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], #        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], #        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], #        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], #        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], #        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], #        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], #        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:

n1 = range(5) n2 = range(5)

But when I use:

b = a[n1,n2] # array([ 0, 11, 22, 33, 44])

Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:

b = a[n1,:] b = b[:,n2] # array([[ 0,  1,  2,  3,  4], #        [10, 11, 12, 13, 14], #        [20, 21, 22, 23, 24], #        [30, 31, 32, 33, 34], #        [40, 41, 42, 43, 44]])

But I am sure there should be a way to do this simple task in just one command.

488

asked Jun 18 '15 14:06

CrossEntropy

2 Answers

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.

There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.

In your case:

import numpy as np  a = np.arange(100).reshape(10,10) n1, n2 = np.arange(5), np.arange(5)  # Not what you want b = a[n1, n2]  # array([ 0, 11, 22, 33, 44])  # What you want, but only for simple sequences # Note that no copy of *a* is made!! This is a view. b = a[:5, :5]  # What you want, but probably confusing at first. (Also, makes a copy.) # np.meshgrid and np.ix_ are basically equivalent to this. b = a[n1[:,None], n2[None,:]]

Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.

print "Fancy Indexing:" print a[n1, n2]  print "Manual indexing:" for i, j in zip(n1, n2):     print a[i, j]

However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.

In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.

In [4]: a[[[1, 2, 3]], [[1],[2],[3]]] Out[4]: array([[11, 21, 31],        [12, 22, 32],        [13, 23, 33]])  In [5]: a[[1, 2, 3], [1, 2, 3]] Out[5]: array([11, 22, 33])

To be a bit more precise,

a[[[1, 2, 3]], [[1],[2],[3]]]

is treated exactly like:

i = [[1, 1, 1],      [2, 2, 2],      [3, 3, 3]]) j = [[1, 2, 3],      [1, 2, 3],      [1, 2, 3]] a[i, j]

In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.

np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:

In [6]: np.ix_([1, 2, 3], [1, 2, 3]) Out[6]: (array([[1],        [2],        [3]]), array([[1, 2, 3]]))

Similarly (the sparse argument would make it identical to ix_ above):

In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij') Out[7]: [array([[1, 1, 1],        [2, 2, 2],        [3, 3, 3]]),  array([[1, 2, 3],        [1, 2, 3],        [1, 2, 3]])]

150

answered Oct 05 '22 21:10

Joe Kington

Another quick way to build the desired index is to use the np.ix_ function:

>>> a[np.ix_(n1, n2)] array([[ 0,  1,  2,  3,  4],        [10, 11, 12, 13, 14],        [20, 21, 22, 23, 24],        [30, 31, 32, 33, 34],        [40, 41, 42, 43, 44]])

This provides a convenient way to construct an open mesh from sequences of indices.

answered Oct 05 '22 21:10

Alex Riley

Related questions
                            
                                How to execute two "aggregate" functions (like sum) concurrently, feeding them from the same iterator?
                            
                                Draw a line at specific position/annotate a Facetgrid in seaborn
                            
                                Dynamically importing Python module
                            
                                How to display picture and get mouse click coordinate on it [closed]
                            
                                Python multiprocessing - How to release memory when a process is done?
                            
                                scipy, lognormal distribution - parameters
                            
                                Getting container/parent object from within python
                            
                                How can I reorder multi-indexed dataframe columns at a specific level
                            
                                Converting (YYYY-MM-DD-HH:MM:SS) date time
                            
                                Why can functions in Python print variables in enclosing scope but cannot use them in assignment?
                            
                                ggplot styles in Python
                            
                                Computing the correlation coefficient between two multi-dimensional arrays
                            
                                Writing pandas DataFrame to JSON in unicode
                            
                                How to add a variable to Python plt.title?
                            
                                Matplotlib semi-log plot: minor tick marks are gone when range is large
                            
                                Removing duplicates from Pandas rows, replace them with NaNs, shift NaNs to end of rows
                            
                                In Python, how do you find the index of the first value greater than a threshold in a sorted list?
                            
                                Proper way to handle static files and templates for Django on Heroku
                            
                                Preserve whitespaces when using split() and join() in python
                            
                                How can I more easily suppress previous exceptions when I raise my own exception in response?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With