Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy: Why the need to explicitly copy a value?

Tags:

python

numpy

arr = np.arange(0,11)
slice_of_arr = arr[0:6]
slice_of_arr[:]=99

# slice_of_arr returns
array([99, 99, 99, 99, 99, 99])
# arr returns
array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

As the example shown above, you cannot directly change the value of the slice_of_arr, because it's a view of arr, not a new variable.

My questions are:

  1. Why does NumPy design like this? Wouldn't it be tedious every time you need to .copy and then assign value?
  2. Is there anything I can do, to get rid of the .copy? How can I change this default behavior of NumPy?
like image 414
cqcn1991 Avatar asked Jul 23 '15 07:07

cqcn1991


People also ask

What does Copy do in NumPy?

The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy. The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

Is NumPy copy a deep copy?

If you are dealing with NumPy matrices, then numpy. copy() will give you a deep copy.

How can you shallow copy the data in NumPy?

copy() is supposed to create a shallow copy of its argument, but when applied to a NumPy array it creates a shallow copy in sense B, i.e. the new array gets its own copy of the data buffer, so changes to one array do not affect the other. x_copy = x. copy() is all you need to make a copy of x .

Which function of NumPy should be used to make deep copy of an array?

copy() function creates a deep copy. It is a complete copy of the array and its data, and doesn't share with the original array.


2 Answers

I think you have the answers in the other comments, but more specifically:

1.a. Why does NumPy design like this?
Because it's way faster (constant time) to create a view rather than creating a whole array (linear time).

1.b. Wouldn't it be tedious every time you need to .copy and then assign value?
Actually it's not that common to need to create a copy. So no, it's not tedious. Even if it can be surprising at first this design is very good.

2.a. Is there anything I can do, to get rid of the .copy?
I can't really tell without seing real code. In the toy example you give, you can't avoid creating a copy, but in real code you usually apply functions to the data, which return another array so a copy isn't needed.
Can you give an example of real code where you need to call .copy repeatedly ?

2.b. How can I change this default behavior of NumPy?
You can't. Try to get used to it, you'll see how powerfull it is.

like image 171
J. Martinot-Lagarde Avatar answered Nov 05 '22 17:11

J. Martinot-Lagarde


What does (numpy) __array_wrap__ do?

talks about ndarray subclasses and hooks like __array_wrap__. np.array takes copy parameter, forcing the result to be a copy, even if it isn't required by other considerations. ravel returns a view, flatten a copy. So it is probably possible, and maybe not too difficult, to construct a ndarray subclass that forces a copy. It may involve modifying a hook like __array_wrap__.

Or maybe modifying the .__getitem__ method. Indexing as in slice_of_arr = arr[0:6] involves a call to __getitem__. For ndarray this is compiled, but for a masked array, it is python code that you could use as an example:

/usr/lib/python3/dist-packages/numpy/ma/core.py

It may be something as simple as

def __getitem__(self, indx):
    """x.__getitem__(y) <==> x[y]
    """
    # _data = ndarray.view(self, ndarray) # change to:
    _data = ndarray.copy(self, ndarray)
    dout = ndarray.__getitem__(_data, indx)
    return dout

But I suspect that by the time you develop and fully test such a subclass, you might fall in love with the default no-copy approach. While this view-v-copy business bites many new comers (especially if coming from MATLAB), I haven't seen complaints from experienced users. Look at other numpy SO questions; you won't see a lot copy() calls.

Even regular Python users are used asking themselves whether a reference or slice is a copy or not, and whether something is mutable or not.

for example with lists:

In [754]: ll=[1,2,[3,4,5],6]
In [755]: llslice=ll[1:-1]
In [756]: llslice[1][1:2]=[10,11,12]
In [757]: ll
Out[757]: [1, 2, [3, 10, 11, 12, 5], 6]

modifying an item an item inside a slice modifies that same item in the original list. In contrast to numpy, a list slice is a copy. But it's a shallow copy. You have to take extra effort to make a deep copy (import copy).

/usr/lib/python3/dist-packages/numpy/lib/index_tricks.py contains some indexing functions aimed at making certain indexing operations more convenient. Several are actually classes, or class instances, with custom __getitem__ methods. They may also serve as models of how to customize your slicing and indexing.

like image 40
hpaulj Avatar answered Nov 05 '22 19:11

hpaulj