Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python copy NumPy arrays where the length of the dimensions are the same?

Tags:

python

numpy

I have a problem with referencing to a NumPy array. I have an array of the form

import numpy as np a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6, 0.8])] 

If I now create a new variable,

b = np.array(a) 

and do

b[0] += 1 print(a) 

then a is not changing.

a = [array([0. , 0.2, 0.4, 0.6, 0.8]),      array([0. , 0.2, 0.4, 0.6, 0.8]),      array([0. , 0.2, 0.4, 0.6, 0.8])] 

But if I do the same thing with:

a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6])] 

so I removed one number in the end of the last dimension. Then I do this again:

b = np.array(a) b[0] += 1 print(a) 

Now a is changing, what I thought is the normal behavior in Python.

a = [array([1. , 1.2, 1.4, 1.6, 1.8]),      array([0. , 0.2, 0.4, 0.6, 0.8]),      array([0. , 0.2, 0.4, 0.6])] 

Can anybody explain me this?

like image 880
sholli Avatar asked Feb 19 '19 07:02

sholli


People also ask

Are NumPy arrays fixed size?

NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.

Are NumPy arrays copied?

The array. copy() method in numpy is used to make and change an original array, and returns the two arrays.

Do NumPy arrays have immutable size?

Numpy DOES NOT have an immutable array.

How do you copy a NumPy array in python?

Use numpy. copy() function to copy Python NumPy array (ndarray) to another array. This method takes the array you wanted to copy as an argument and returns an array copy of the given object. The copy owns the data and any changes made to the copy will not affect the original array.


2 Answers

In [1]: a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]),     ...:      np.array([0.0, 0.2, 0.4, 0.6, 0.8]),     ...:      np.array([0.0, 0.2, 0.4, 0.6, 0.8])]                                In [2]:                                                                          In [2]: a                                                                        Out[2]:  [array([0. , 0.2, 0.4, 0.6, 0.8]),  array([0. , 0.2, 0.4, 0.6, 0.8]),  array([0. , 0.2, 0.4, 0.6, 0.8])] 

a is a list of arrays. b is a 2d array.

In [3]: b = np.array(a)                                                          In [4]: b                                                                        Out[4]:  array([[0. , 0.2, 0.4, 0.6, 0.8],        [0. , 0.2, 0.4, 0.6, 0.8],        [0. , 0.2, 0.4, 0.6, 0.8]]) In [5]: b[0] += 1                                                                In [6]: b                                                                        Out[6]:  array([[1. , 1.2, 1.4, 1.6, 1.8],        [0. , 0.2, 0.4, 0.6, 0.8],        [0. , 0.2, 0.4, 0.6, 0.8]]) 

b gets values from a but does not contain any of the a objects. The underlying data structure of this b is very different from a, the list. If that isn't clear, you may want to review the numpy basics (which talk about shape, strides, and data buffers).

In the second case, b is an object array, containing the same objects as a:

In [8]: b = np.array(a)                                                          In [9]: b                                                                        Out[9]:  array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]),        array([0. , 0.2, 0.4, 0.6])], dtype=object) 

This b behaves a lot like the a - both contain arrays.

The construction of this object array is quite different from the 2d numeric array. I think of the numeric array as the default, or normal, numpy behavior, while the object array is a 'concession', giving us a useful tool, but one which does not have the calculation power of the multidimensional array.

It is easy to make an object array by mistake - some say too easy. It can be harder to make one reliably by design. FOr example with the original a, we have to do:

In [17]: b = np.empty(3, object)                                                 In [18]: b[:] = a[:]                                                             In [19]: b                                                                       Out[19]:  array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]),        array([0. , 0.2, 0.4, 0.6, 0.8])], dtype=object) 

or even for i in range(3): b[i] = a[i]

like image 117
hpaulj Avatar answered Sep 18 '22 15:09

hpaulj


In a nutshell, this is a consequence of your data. You'll notice that this works/does not work (depending on how you view it) because your arrays are not equally sized.

With equal sized sub-arrays, the elements can be compactly loaded into a memory efficient scheme where any N-D array can be represented by a compact 1-D array in memory. NumPy then handles the translation of multi-dimensional indexes to 1D indexes internally. For example, index [i, j] of a 2D array will map to i*N + j (if storing in row major format). The data from the original list of arrays is copied into a compact 1D array, so any modifications made to this array does not affect the original.

With ragged lists/arrays, this cannot be done. The array is effectively a python list, where each element is a python object. For efficiency, only the object references are copied and not the data. This is why you can mutate the original list elements in the second case but not the first.

like image 29
cs95 Avatar answered Sep 21 '22 15:09

cs95