I want to slice a NumPy nxn array. I want to extract an arbitrary selection of m rows and columns of that array (i.e. without any pattern in the numbers of rows/columns), making it a new, mxm array. For this example let us say the array is 4x4 and I want to extract a 2x2 array from it.
Here is our array:
from numpy import * x = range(16) x = reshape(x,(4,4)) print x [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15]]
The line and columns to remove are the same. The easiest case is when I want to extract a 2x2 submatrix that is at the beginning or at the end, i.e. :
In [33]: x[0:2,0:2] Out[33]: array([[0, 1], [4, 5]]) In [34]: x[2:,2:] Out[34]: array([[10, 11], [14, 15]])
But what if I need to remove another mixture of rows/columns? What if I need to remove the first and third lines/rows, thus extracting the submatrix [[5,7],[13,15]]
? There can be any composition of rows/lines. I read somewhere that I just need to index my array using arrays/lists of indices for both rows and columns, but that doesn't seem to work:
In [35]: x[[1,3],[1,3]] Out[35]: array([ 5, 15])
I found one way, which is:
In [61]: x[[1,3]][:,[1,3]] Out[61]: array([[ 5, 7], [13, 15]])
First issue with this is that it is hardly readable, although I can live with that. If someone has a better solution, I'd certainly like to hear it.
Other thing is I read on a forum that indexing arrays with arrays forces NumPy to make a copy of the desired array, thus when treating with large arrays this could become a problem. Why is that so / how does this mechanism work?
To slice elements from two-dimensional arrays, you need to specify both a row index and a column index as [row_index, column_index] . For example, you can use the index [1,2] to query the element at the second row, third column in precip_2002_2013 .
As with indexing, the array you get back when you index or slice a numpy array is a view of the original array. It is the same data, just accessed in a different order. This is different to lists, where a slice returns a completely new list.
To get the subarray we can use slicing to get the subarray. Step 1: Run a loop till length+1 of the given list. Step 2: Run another loop from 0 to i. Step 3: Slice the subarray from j to i.
You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.
To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array x
from your question. The buffer assigned to x
will contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j]
, NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j
(and multiplying with the size of an int to get an actual memory offset).
If you extract a subarray by basic slicing like y = x[0:2,0:2]
, the resulting object will share the underlying buffer with x
. But what happens if you acces y[i,j]
? NumPy can't use i*y.shape[1]+j
to calculate the offset into the array, because the data belonging to y
is not consecutive in memory.
NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j]
, what is actually calculated is i*x.strides[0]+j*x.strides[1]
(and this already includes the factor for the size of an int):
x.strides (16, 4)
When y
is extracted like above, NumPy does not create a new buffer, but it does create a new array object referencing the same buffer (otherwise y
would just be equal to x
.) The new array object will have a different shape then x
and maybe a different starting offset into the buffer, but will share the strides with x
(in this case at least):
y.shape (2,2) y.strides (16, 4)
This way, computing the memory offset for y[i,j]
will yield the correct result.
But what should NumPy do for something like z=x[[1,3]]
? The strides mechanism won't allow correct indexing if the original buffer is used for z
. NumPy theoretically could add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.
This is covered in depth in the NumPy documentation on indexing.
Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:
x[[[1],[3]],[1,3]]
This is because the index arrays are broadcasted to a common shape. Of course, for this particular example, you can also make do with basic slicing:
x[1::2, 1::2]
As Sven mentioned, x[[[0],[2]],[1,3]]
will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]]
will return the values x[0,1] and x[2,3] in an array.
There is a helpful function for doing the first example I gave, numpy.ix_
. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]
. This can save you from having to enter in all of those extra brackets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With