Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy slice an array without copying it

I have a large data in matrix x and I need to analyze some some submatrices.

I am using the following code to select the submatrix:

>>> import numpy as np
>>> x = np.random.normal(0,1,(20,2))
>>> x
array([[-1.03266826,  0.04646684],
       [ 0.05898304,  0.31834926],
       [-0.1916809 , -0.97929025],
       [-0.48837085, -0.62295003],
       [-0.50731017,  0.50305894],
       [ 0.06457385, -0.10670002],
       [-0.72573604,  1.10026385],
       [-0.90893845,  0.99827162],
       [ 0.20714399, -0.56965615],
       [ 0.8041371 ,  0.21910274],
       [-0.65882317,  0.2657183 ],
       [-1.1214074 , -0.39886425],
       [ 0.0784783 , -0.21630006],
       [-0.91802557, -0.20178683],
       [ 0.88268539, -0.66470235],
       [-0.03652459,  1.49798484],
       [ 1.76329838, -0.26554555],
       [-0.97546845, -2.41823586],
       [ 0.32335103, -1.35091711],
       [-0.12981597,  0.27591674]])
>>> index = x[:,1] > 0
>>> index
array([ True,  True, False, False,  True, False,  True,  True, False,
        True,  True, False, False, False, False,  True, False, False,
       False,  True], dtype=bool)
>>> x1 = x[index, :] #x1 is a copy of the submatrix
>>> x1
array([[-1.03266826,  0.04646684],
       [ 0.05898304,  0.31834926],
       [-0.50731017,  0.50305894],
       [-0.72573604,  1.10026385],
       [-0.90893845,  0.99827162],
       [ 0.8041371 ,  0.21910274],
       [-0.65882317,  0.2657183 ],
       [-0.03652459,  1.49798484],
       [-0.12981597,  0.27591674]])
>>> x1[0,0] = 1000
>>> x1
array([[  1.00000000e+03,   4.64668400e-02],
       [  5.89830401e-02,   3.18349259e-01],
       [ -5.07310170e-01,   5.03058935e-01],
       [ -7.25736045e-01,   1.10026385e+00],
       [ -9.08938455e-01,   9.98271624e-01],
       [  8.04137104e-01,   2.19102741e-01],
       [ -6.58823174e-01,   2.65718300e-01],
       [ -3.65245877e-02,   1.49798484e+00],
       [ -1.29815968e-01,   2.75916735e-01]])
>>> x
array([[-1.03266826,  0.04646684],
       [ 0.05898304,  0.31834926],
       [-0.1916809 , -0.97929025],
       [-0.48837085, -0.62295003],
       [-0.50731017,  0.50305894],
       [ 0.06457385, -0.10670002],
       [-0.72573604,  1.10026385],
       [-0.90893845,  0.99827162],
       [ 0.20714399, -0.56965615],
       [ 0.8041371 ,  0.21910274],
       [-0.65882317,  0.2657183 ],
       [-1.1214074 , -0.39886425],
       [ 0.0784783 , -0.21630006],
       [-0.91802557, -0.20178683],
       [ 0.88268539, -0.66470235],
       [-0.03652459,  1.49798484],
       [ 1.76329838, -0.26554555],
       [-0.97546845, -2.41823586],
       [ 0.32335103, -1.35091711],
       [-0.12981597,  0.27591674]])
>>> 

but I would like x1 to be only a pointer or something like this. Copy the data every time that I need a submatrix is too expensive for me. How can I do that?

EDIT: Apparently there is not any solution with the numpy array. Are the pandas data frame better from this point of view?

like image 454
Donbeo Avatar asked May 14 '15 13:05

Donbeo


People also ask

Does NumPy slicing create a copy?

Whether a view or a copy is created is determined by whether the indexing can be represented as a slice. Exception: If one does "fancy indexing" then always a copy is created.

Can NumPy arrays be sliced?

Slicing in python means extracting data from one given index to another given index, however, NumPy slicing is slightly different. Slicing can be done with the help of (:) . A NumPy array slicing object is constructed by giving start , stop , and step parameters to the built-in slicing function.

How do I slice a column of an array in NumPy?

Slice Two-dimensional Numpy Arrays To slice elements from two-dimensional arrays, you need to specify both a row index and a column index as [row_index, column_index] . For example, you can use the index [1,2] to query the element at the second row, third column in precip_2002_2013 .

Can you slice an array in python?

Array slicing is similar to list slicing in Python. Array indexing also begins from 0 . However, since arrays can be multidimensional, we have to specify the slice for each dimension. As we are mainly working with 2 dimensional arrays in this guide, we need to specify the row and column like what we do in a matrix.


3 Answers

Since index is an array of type bool, you are doing advanced indexing. And the docs say: „Advanced indexing always returns a copy of the data.“

This makes a lot of sense. Compared to normal indexing where you only need to know the start, stop and step, advanced indexing can use any value from the original array without such a simple rule. This would mean having lots of extra meta information where referenced indices point to that might use more memory than a copy.

like image 139
Mike Müller Avatar answered Oct 08 '22 13:10

Mike Müller


The information for your array x is summarized in the .__array_interface__ property

In [433]: x.__array_interface__
Out[433]: 
{'descr': [('', '<f8')],
 'strides': None,
 'data': (171396104, False),
 'typestr': '<f8',
 'version': 3,
 'shape': (20, 2)}

It has the array shape, strides (default here), and pointer to the data buffer. A view can point to the same data buffer (possibly further along), and have its own shape and strides.

But indexing with your boolean can't be summarized in those few numbers. Either it has to carry the index array all the way through, or copy selected items from the x data buffer. numpy chooses to copy. You have choice of when to apply the index, now or further down the calling stack.

like image 30
hpaulj Avatar answered Oct 08 '22 11:10

hpaulj


If you can manage with a traditional slice such as

x1 = x[3:8]

Then it will be just a pointer.

Have you looked at using masked arrays? You might be able to do exactly what you want.

x = np.array([0.12, 0.23],
             [1.23, 3.32],
               ...
             [0.75, 1.23]])

data = np.array([[False, False],
                 [True, True],
                ...
                 [True, True]])

x1 = np.ma.array(x, mask=data)
## x1 can be worked on and only includes elements of x where data==False
like image 25
paddyg Avatar answered Oct 08 '22 11:10

paddyg