<pre class="prettyprint"><code>arr = np.arange(0,11) slice_of_arr = arr[0:6] slice_of_arr[:]=99 # slice_of_arr returns array([99, 99, 99, 99, 99, 99]) # arr returns array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10]) </code></pre> As the example shown above, you cannot directly change the value of the <code>slice_of_arr</code>, because it's a view of <code>arr</code>, not a new variable. My questions are: <ol> <li>Why does NumPy design like this? Wouldn't it be tedious every time you need to <code>.copy</code> and then assign value?</li> <li>Is there anything I can do, to get rid of the <code>.copy</code>? How can I change this default behavior of NumPy?</li> </ol>

I think you have the answers in the other comments, but more specifically: 1.a. Why does NumPy design like this? Because it's way faster (constant time) to create a view rather than creating a whole array (linear time). 1.b. Wouldn't it be tedious every time you need to .copy and then assign value? Actually it's not that common to need to create a copy. So no, it's not tedious. Even if it can be surprising at first this design is very good. 2.a. Is there anything I can do, to get rid of the .copy? I can't really tell without seing real code. In the toy example you give, you can't avoid creating a copy, but in real code you usually apply functions to the data, which return another array so a copy isn't needed. Can you give an example of real code where you need to call <code>.copy</code> repeatedly ? 2.b. How can I change this default behavior of NumPy? You can't. Try to get used to it, you'll see how powerfull it is.

NumPy: Why the need to explicitly copy a value?

Tags:

python

numpy

arr = np.arange(0,11)
slice_of_arr = arr[0:6]
slice_of_arr[:]=99

# slice_of_arr returns
array([99, 99, 99, 99, 99, 99])
# arr returns
array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

As the example shown above, you cannot directly change the value of the slice_of_arr, because it's a view of arr, not a new variable.

My questions are:

Why does NumPy design like this? Wouldn't it be tedious every time you need to .copy and then assign value?
Is there anything I can do, to get rid of the .copy? How can I change this default behavior of NumPy?

414

asked Jul 23 '15 07:07

cqcn1991

2 Answers

I think you have the answers in the other comments, but more specifically:

1.a. Why does NumPy design like this?
Because it's way faster (constant time) to create a view rather than creating a whole array (linear time).

1.b. Wouldn't it be tedious every time you need to .copy and then assign value?
Actually it's not that common to need to create a copy. So no, it's not tedious. Even if it can be surprising at first this design is very good.

2.a. Is there anything I can do, to get rid of the .copy?
I can't really tell without seing real code. In the toy example you give, you can't avoid creating a copy, but in real code you usually apply functions to the data, which return another array so a copy isn't needed.
Can you give an example of real code where you need to call .copy repeatedly ?

2.b. How can I change this default behavior of NumPy?
You can't. Try to get used to it, you'll see how powerfull it is.

171

answered Nov 05 '22 17:11

J. Martinot-Lagarde

What does (numpy) __array_wrap__ do?

talks about ndarray subclasses and hooks like __array_wrap__. np.array takes copy parameter, forcing the result to be a copy, even if it isn't required by other considerations. ravel returns a view, flatten a copy. So it is probably possible, and maybe not too difficult, to construct a ndarray subclass that forces a copy. It may involve modifying a hook like __array_wrap__.

Or maybe modifying the .__getitem__ method. Indexing as in slice_of_arr = arr[0:6] involves a call to __getitem__. For ndarray this is compiled, but for a masked array, it is python code that you could use as an example:

/usr/lib/python3/dist-packages/numpy/ma/core.py

It may be something as simple as

def __getitem__(self, indx):
    """x.__getitem__(y) <==> x[y]
    """
    # _data = ndarray.view(self, ndarray) # change to:
    _data = ndarray.copy(self, ndarray)
    dout = ndarray.__getitem__(_data, indx)
    return dout

But I suspect that by the time you develop and fully test such a subclass, you might fall in love with the default no-copy approach. While this view-v-copy business bites many new comers (especially if coming from MATLAB), I haven't seen complaints from experienced users. Look at other numpy SO questions; you won't see a lot copy() calls.

Even regular Python users are used asking themselves whether a reference or slice is a copy or not, and whether something is mutable or not.

for example with lists:

In [754]: ll=[1,2,[3,4,5],6]
In [755]: llslice=ll[1:-1]
In [756]: llslice[1][1:2]=[10,11,12]
In [757]: ll
Out[757]: [1, 2, [3, 10, 11, 12, 5], 6]

modifying an item an item inside a slice modifies that same item in the original list. In contrast to numpy, a list slice is a copy. But it's a shallow copy. You have to take extra effort to make a deep copy (import copy).

/usr/lib/python3/dist-packages/numpy/lib/index_tricks.py contains some indexing functions aimed at making certain indexing operations more convenient. Several are actually classes, or class instances, with custom __getitem__ methods. They may also serve as models of how to customize your slicing and indexing.

answered Nov 05 '22 19:11

hpaulj

Related questions
                            
                                Offline Installation of python & pip
                            
                                out of memory error when reading csv file in chunk
                            
                                How to update the value of a row of a WPF DataGrid from IronPython?
                            
                                supplying variables to class dynamically
                            
                                join function of a numpy array composed of string
                            
                                TypeError: 'int' object has no attribute '__getitem__' error because of possible erratum in book
                            
                                Desktop Launcher for Python Script Starts Program in Wrong Path
                            
                                Get start date and end date of the week, given week number and year
                            
                                Calculate energy for each frequency band around frequency F of interest in Python
                            
                                Behave test runner has no colored output on Jenkins
                            
                                Starting bottle web server through systemd?
                            
                                Python: Multiple try except blocks in one?
                            
                                scikit-learn kmeans custom distance [duplicate]
                            
                                Is parsing a json naively into a Python class or struct secure?
                            
                                Django manytomany field, how to get check for 80% subset/ match?
                            
                                Python whats the most efficient way to wait for input
                            
                                Merging DataFrames on multiple conditions - not specifically on equal values
                            
                                Axis argument to .loc() to interpret the passed slicers on a axis=1
                            
                                Increasing efficiency of barycentric coordinate calculation in python
                            
                                How to force Python dictionary to shrink?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With