I have a problem with referencing to a NumPy array. I have an array of the form <pre class="prettyprint"><code>import numpy as np a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]), np.array([0.0, 0.2, 0.4, 0.6, 0.8]), np.array([0.0, 0.2, 0.4, 0.6, 0.8])] </code></pre> If I now create a new variable, <pre class="prettyprint"><code>b = np.array(a) </code></pre> and do <pre class="prettyprint"><code>b[0] += 1 print(a) </code></pre> then <code>a</code> is not changing. <pre class="prettyprint"><code>a = [array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8])] </code></pre> But if I do the same thing with: <pre class="prettyprint"><code>a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]), np.array([0.0, 0.2, 0.4, 0.6, 0.8]), np.array([0.0, 0.2, 0.4, 0.6])] </code></pre> so I removed one number in the end of the last dimension. Then I do this again: <pre class="prettyprint"><code>b = np.array(a) b[0] += 1 print(a) </code></pre> Now <code>a</code> is changing, what I thought is the normal behavior in Python. <pre class="prettyprint"><code>a = [array([1. , 1.2, 1.4, 1.6, 1.8]), array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6])] </code></pre> Can anybody explain me this?

<pre class="prettyprint"><code>In [1]: a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]), ...: np.array([0.0, 0.2, 0.4, 0.6, 0.8]), ...: np.array([0.0, 0.2, 0.4, 0.6, 0.8])] In [2]: In [2]: a Out[2]: [array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8])] </code></pre> <code>a</code> is a list of arrays. <code>b</code> is a 2d array. <pre class="prettyprint"><code>In [3]: b = np.array(a) In [4]: b Out[4]: array([[0. , 0.2, 0.4, 0.6, 0.8], [0. , 0.2, 0.4, 0.6, 0.8], [0. , 0.2, 0.4, 0.6, 0.8]]) In [5]: b[0] += 1 In [6]: b Out[6]: array([[1. , 1.2, 1.4, 1.6, 1.8], [0. , 0.2, 0.4, 0.6, 0.8], [0. , 0.2, 0.4, 0.6, 0.8]]) </code></pre> <code>b</code> gets values from <code>a</code> but does not contain any of the <code>a</code> objects. The underlying data structure of this <code>b</code> is very different from <code>a</code>, the list. If that isn't clear, you may want to review the <code>numpy</code> basics (which talk about shape, strides, and data buffers). In the second case, <code>b</code> is an object array, containing the same objects as <code>a</code>: <pre class="prettyprint"><code>In [8]: b = np.array(a) In [9]: b Out[9]: array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6])], dtype=object) </code></pre> This <code>b</code> behaves a lot like the <code>a</code> - both contain arrays. The construction of this object array is quite different from the 2d numeric array. I think of the numeric array as the default, or normal, numpy behavior, while the object array is a 'concession', giving us a useful tool, but one which does not have the calculation power of the multidimensional array. It is easy to make an object array by mistake - some say too easy. It can be harder to make one reliably by design. FOr example with the original <code>a</code>, we have to do: <pre class="prettyprint"><code>In [17]: b = np.empty(3, object) In [18]: b[:] = a[:] In [19]: b Out[19]: array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8])], dtype=object) </code></pre> or even <code>for i in range(3): b[i] = a[i]</code>

Why does Python copy NumPy arrays where the length of the dimensions are the same?

Tags:

python

numpy

I have a problem with referencing to a NumPy array. I have an array of the form

import numpy as np a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6, 0.8])]

If I now create a new variable,

b = np.array(a)

and do

b[0] += 1 print(a)

then a is not changing.

a = [array([0. , 0.2, 0.4, 0.6, 0.8]),      array([0. , 0.2, 0.4, 0.6, 0.8]),      array([0. , 0.2, 0.4, 0.6, 0.8])]

But if I do the same thing with:

a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6, 0.8]),      np.array([0.0, 0.2, 0.4, 0.6])]

so I removed one number in the end of the last dimension. Then I do this again:

b = np.array(a) b[0] += 1 print(a)

Now a is changing, what I thought is the normal behavior in Python.

a = [array([1. , 1.2, 1.4, 1.6, 1.8]),      array([0. , 0.2, 0.4, 0.6, 0.8]),      array([0. , 0.2, 0.4, 0.6])]

Can anybody explain me this?

880

asked Feb 19 '19 07:02

sholli

2 Answers

In [1]: a = [np.array([0.0, 0.2, 0.4, 0.6, 0.8]),     ...:      np.array([0.0, 0.2, 0.4, 0.6, 0.8]),     ...:      np.array([0.0, 0.2, 0.4, 0.6, 0.8])]                                In [2]:                                                                          In [2]: a                                                                        Out[2]:  [array([0. , 0.2, 0.4, 0.6, 0.8]),  array([0. , 0.2, 0.4, 0.6, 0.8]),  array([0. , 0.2, 0.4, 0.6, 0.8])]

a is a list of arrays. b is a 2d array.

In [3]: b = np.array(a)                                                          In [4]: b                                                                        Out[4]:  array([[0. , 0.2, 0.4, 0.6, 0.8],        [0. , 0.2, 0.4, 0.6, 0.8],        [0. , 0.2, 0.4, 0.6, 0.8]]) In [5]: b[0] += 1                                                                In [6]: b                                                                        Out[6]:  array([[1. , 1.2, 1.4, 1.6, 1.8],        [0. , 0.2, 0.4, 0.6, 0.8],        [0. , 0.2, 0.4, 0.6, 0.8]])

b gets values from a but does not contain any of the a objects. The underlying data structure of this b is very different from a, the list. If that isn't clear, you may want to review the numpy basics (which talk about shape, strides, and data buffers).

In the second case, b is an object array, containing the same objects as a:

In [8]: b = np.array(a)                                                          In [9]: b                                                                        Out[9]:  array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]),        array([0. , 0.2, 0.4, 0.6])], dtype=object)

This b behaves a lot like the a - both contain arrays.

The construction of this object array is quite different from the 2d numeric array. I think of the numeric array as the default, or normal, numpy behavior, while the object array is a 'concession', giving us a useful tool, but one which does not have the calculation power of the multidimensional array.

It is easy to make an object array by mistake - some say too easy. It can be harder to make one reliably by design. FOr example with the original a, we have to do:

In [17]: b = np.empty(3, object)                                                 In [18]: b[:] = a[:]                                                             In [19]: b                                                                       Out[19]:  array([array([0. , 0.2, 0.4, 0.6, 0.8]), array([0. , 0.2, 0.4, 0.6, 0.8]),        array([0. , 0.2, 0.4, 0.6, 0.8])], dtype=object)

or even for i in range(3): b[i] = a[i]

117

answered Sep 18 '22 15:09

hpaulj

In a nutshell, this is a consequence of your data. You'll notice that this works/does not work (depending on how you view it) because your arrays are not equally sized.

With equal sized sub-arrays, the elements can be compactly loaded into a memory efficient scheme where any N-D array can be represented by a compact 1-D array in memory. NumPy then handles the translation of multi-dimensional indexes to 1D indexes internally. For example, index [i, j] of a 2D array will map to i*N + j (if storing in row major format). The data from the original list of arrays is copied into a compact 1D array, so any modifications made to this array does not affect the original.

With ragged lists/arrays, this cannot be done. The array is effectively a python list, where each element is a python object. For efficiency, only the object references are copied and not the data. This is why you can mutate the original list elements in the second case but not the first.

answered Sep 21 '22 15:09

cs95

Related questions
                            
                                Running a Jupyter notebook from another notebook
                            
                                python typing signature (typing.Callable) for function with kwargs
                            
                                Pretty print in lxml is failing when I add tags to a parsed tree
                            
                                Getting the same subplot size using matplotlib imshow and scatter
                            
                                Reading contents of a gzip file from a AWS S3 in Python
                            
                                Can I program Nvidia's CUDA using only Python or do I have to learn C?
                            
                                Compare similarity of images using OpenCV with Python
                            
                                Boto - Uploading file to a specific location on Amazon S3
                            
                                'Attempted relative import in non-package' although packages with __init__.py in one directory
                            
                                scikit-learn DBSCAN memory usage
                            
                                Type error Unhashable type:set
                            
                                How to add or increment single item of the Python Counter class
                            
                                Improve Pandas Merge performance
                            
                                How to call a async function from a synchronized code Python
                            
                                How can I use valgrind with Python C++ extensions?
                            
                                Does Python do slice-by-reference on strings?
                            
                                Removing entries from a dictionary based on values
                            
                                Load CSV to Pandas MultiIndex DataFrame
                            
                                Failed to install package Beautiful Soup. Error Message is "SyntaxError: Missing parentheses in call to 'print'"
                            
                                Using a sparse matrix versus numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With