Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting a 2D numpy array

I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet.

I have a numpy array, and for the sake of argument, let it be defined as follows:

import numpy as np a = np.arange(100) a.shape = (10,10) # array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9], #        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], #        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], #        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], #        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], #        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], #        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], #        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], #        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], #        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]) 

now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:

n1 = range(5) n2 = range(5) 

But when I use:

b = a[n1,n2] # array([ 0, 11, 22, 33, 44]) 

Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:

b = a[n1,:] b = b[:,n2] # array([[ 0,  1,  2,  3,  4], #        [10, 11, 12, 13, 14], #        [20, 21, 22, 23, 24], #        [30, 31, 32, 33, 34], #        [40, 41, 42, 43, 44]]) 

But I am sure there should be a way to do this simple task in just one command.

like image 488
CrossEntropy Avatar asked Jun 18 '15 14:06

CrossEntropy


People also ask

How do you slice a 2D NP array?

Slice Two-dimensional Numpy Arrays To slice elements from two-dimensional arrays, you need to specify both a row index and a column index as [row_index, column_index] . For example, you can use the index [1,2] to query the element at the second row, third column in precip_2002_2013 .

How do you create a subset of a NumPy array in Python?

Creating such subsets from an existing numpy. ndarray object is based on several criteria. For example, a subset of an ndarray can be created from the elements of a diagonal, or one dimension can be omitted of an array if there is only one element in that dimension.

How do you segment a 2D array in Python?

For splitting the 2d array,you can use two specific functions which helps in splitting the NumPy arrays row wise and column wise which are split and hsplit respectively . 1. split function is used for Row wise splitting. 2.

What does subsetting in NumPy refer to?

Slicing in python means taking elements from one given index to another given index. We pass slice instead of index like this: [start:end] . We can also define the step, like this: [start:end:step] .

How do I subset a 2D array?

Subsetting 2D arrays is similar to subsetting nested lists. In a 2D array, the indexing or slicing must be specific to the dimension of the array: numpy is imported as np and the 2D array stock_array_transposed (from the previous exercise) is available in your workspace. Extract the first column from stock_array_transposed and assign it to prices.

What is a 2D array in NumPy?

In this we are specifically going to talk about 2D arrays. 2D Array can be defined as array of an array. 2D array are also called as Matrices which can be represented as collection of rows and columns. In this article, we have explored 2D array in Numpy in Python.

How do you subset A list of NumPy arrays?

To subset both regular Python lists and numpy arrays, you can use square brackets: x = [4, 9, 6, 3, 1] x import numpy as np y = np.array (x) y For numpy specifically, you can also use boolean numpy arrays: high = y > 5 y [high]

How to sum two NumPy arrays in Python?

Name this array conversion. Multiply np_baseball with conversion and print out the result. np_baseball + updated will do an element-wise summation of the two numpy arrays. Create a numpy array with np.array (); the input is a regular Python list with three elements.


2 Answers

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.

There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.

In your case:

import numpy as np  a = np.arange(100).reshape(10,10) n1, n2 = np.arange(5), np.arange(5)  # Not what you want b = a[n1, n2]  # array([ 0, 11, 22, 33, 44])  # What you want, but only for simple sequences # Note that no copy of *a* is made!! This is a view. b = a[:5, :5]  # What you want, but probably confusing at first. (Also, makes a copy.) # np.meshgrid and np.ix_ are basically equivalent to this. b = a[n1[:,None], n2[None,:]] 

Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.

print "Fancy Indexing:" print a[n1, n2]  print "Manual indexing:" for i, j in zip(n1, n2):     print a[i, j] 

However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.

In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.

In [4]: a[[[1, 2, 3]], [[1],[2],[3]]] Out[4]: array([[11, 21, 31],        [12, 22, 32],        [13, 23, 33]])  In [5]: a[[1, 2, 3], [1, 2, 3]] Out[5]: array([11, 22, 33]) 

To be a bit more precise,

a[[[1, 2, 3]], [[1],[2],[3]]] 

is treated exactly like:

i = [[1, 1, 1],      [2, 2, 2],      [3, 3, 3]]) j = [[1, 2, 3],      [1, 2, 3],      [1, 2, 3]] a[i, j] 

In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.


np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:

In [6]: np.ix_([1, 2, 3], [1, 2, 3]) Out[6]: (array([[1],        [2],        [3]]), array([[1, 2, 3]])) 

Similarly (the sparse argument would make it identical to ix_ above):

In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij') Out[7]: [array([[1, 1, 1],        [2, 2, 2],        [3, 3, 3]]),  array([[1, 2, 3],        [1, 2, 3],        [1, 2, 3]])] 
like image 150
Joe Kington Avatar answered Oct 05 '22 21:10

Joe Kington


Another quick way to build the desired index is to use the np.ix_ function:

>>> a[np.ix_(n1, n2)] array([[ 0,  1,  2,  3,  4],        [10, 11, 12, 13, 14],        [20, 21, 22, 23, 24],        [30, 31, 32, 33, 34],        [40, 41, 42, 43, 44]]) 

This provides a convenient way to construct an open mesh from sequences of indices.

like image 33
Alex Riley Avatar answered Oct 05 '22 21:10

Alex Riley