Slicing of a NumPy 2d array, or how do I extract an mxm submatrix from an nxn array (n>m)?

Tags:

I want to slice a NumPy nxn array. I want to extract an arbitrary selection of m rows and columns of that array (i.e. without any pattern in the numbers of rows/columns), making it a new, mxm array. For this example let us say the array is 4x4 and I want to extract a 2x2 array from it.

Here is our array:

from numpy import * x = range(16) x = reshape(x,(4,4))  print x [[ 0  1  2  3]  [ 4  5  6  7]  [ 8  9 10 11]  [12 13 14 15]]

The line and columns to remove are the same. The easiest case is when I want to extract a 2x2 submatrix that is at the beginning or at the end, i.e. :

In [33]: x[0:2,0:2] Out[33]:  array([[0, 1],        [4, 5]])  In [34]: x[2:,2:] Out[34]:  array([[10, 11],        [14, 15]])

But what if I need to remove another mixture of rows/columns? What if I need to remove the first and third lines/rows, thus extracting the submatrix [[5,7],[13,15]]? There can be any composition of rows/lines. I read somewhere that I just need to index my array using arrays/lists of indices for both rows and columns, but that doesn't seem to work:

In [35]: x[[1,3],[1,3]] Out[35]: array([ 5, 15])

I found one way, which is:

    In [61]: x[[1,3]][:,[1,3]] Out[61]:  array([[ 5,  7],        [13, 15]])

First issue with this is that it is hardly readable, although I can live with that. If someone has a better solution, I'd certainly like to hear it.

Other thing is I read on a forum that indexing arrays with arrays forces NumPy to make a copy of the desired array, thus when treating with large arrays this could become a problem. Why is that so / how does this mechanism work?

712

asked Nov 23 '10 15:11

levesque

2 Answers

To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array x from your question. The buffer assigned to x will contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j], NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j (and multiplying with the size of an int to get an actual memory offset).

If you extract a subarray by basic slicing like y = x[0:2,0:2], the resulting object will share the underlying buffer with x. But what happens if you acces y[i,j]? NumPy can't use i*y.shape[1]+j to calculate the offset into the array, because the data belonging to y is not consecutive in memory.

NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j], what is actually calculated is i*x.strides[0]+j*x.strides[1] (and this already includes the factor for the size of an int):

x.strides (16, 4)

When y is extracted like above, NumPy does not create a new buffer, but it does create a new array object referencing the same buffer (otherwise y would just be equal to x.) The new array object will have a different shape then x and maybe a different starting offset into the buffer, but will share the strides with x (in this case at least):

y.shape (2,2) y.strides (16, 4)

This way, computing the memory offset for y[i,j] will yield the correct result.

But what should NumPy do for something like z=x[[1,3]]? The strides mechanism won't allow correct indexing if the original buffer is used for z. NumPy theoretically could add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.

This is covered in depth in the NumPy documentation on indexing.

Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:

x[[[1],[3]],[1,3]]

This is because the index arrays are broadcasted to a common shape. Of course, for this particular example, you can also make do with basic slicing:

x[1::2, 1::2]

138

answered Sep 22 '22 14:09

Sven Marnach

As Sven mentioned, x[[[0],[2]],[1,3]] will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]] will return the values x[0,1] and x[2,3] in an array.

There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.

answered Sep 18 '22 14:09

Justin Peel

Related questions
                            
                                Does "\d" in regex mean a digit?
                            
                                How do I count unique values inside a list
                            
                                Python Infinity - Any caveats?
                            
                                Why do Python's math.ceil() and math.floor() operations return floats instead of integers?
                            
                                Locking a file in Python
                            
                                Using Django time/date widgets in custom form
                            
                                'str' object does not support item assignment
                            
                                Seeking clarification on apparent contradictions regarding weakly typed languages
                            
                                What are the differences between the threading and multiprocessing modules?
                            
                                Numpy where function multiple conditions
                            
                                What's a standard way to do a no-op in python?
                            
                                Any way to clear python's IDLE window?
                            
                                type object 'datetime.datetime' has no attribute 'datetime'
                            
                                How do you use the ellipsis slicing syntax in Python?
                            
                                How do you create nested dict in Python?
                            
                                Basic http file downloading and saving to disk in python?
                            
                                Counting the number of True Booleans in a Python List
                            
                                How to get item's position in a list?
                            
                                Does Python SciPy need BLAS?
                            
                                setting an environment variable in virtualenv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Slicing of a NumPy 2d array, or how do I extract an mxm submatrix from an nxn array (n>m)?

Tags:

python

slice

numpy

levesque

People also ask

2 Answers

Sven Marnach

Justin Peel

Recent Activity

Donate For Us