I want to understand the NumPy behavior. When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value <code>False</code>. Here is the example: <pre class="prettyprint"><code>In [198]: x = np.array([[1,2,3], [4,5,6]]) In [201]: x0 = x[0] In [202]: x0 is x[0] Out[202]: False </code></pre> While on the other hand, with Python native objects, the returned is <code>True</code>. <pre class="prettyprint"><code>In [205]: c = [[1,2,3],[1]] In [206]: c0 = c[0] In [207]: c0 is c[0] Out[207]: True </code></pre> My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.

<h3>2d slicing</h3> When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so <code>x[0]</code> is a 'row', a slice of the original. <pre class="prettyprint"><code>In [81]: arr = np.array([[1,2,3], [4,5,6]]) In [82]: arr.__array_interface__['data'] Out[82]: (181595128, False) In [83]: x0 = arr[0,:] In [84]: x0.__array_interface__['data'] Out[84]: (181595128, False) # same databuffer pointer In [85]: id(x0) Out[85]: 2886887088 In [86]: x1 = arr[0,:] # another slice, different id In [87]: x1.__array_interface__['data'] Out[87]: (181595128, False) In [88]: id(x1) Out[88]: 2886888888 </code></pre> What I wrote earlier about slices still applies. Indexing an individual elements, as with <code>arr[0,0]</code> works the same as with a 1d array. This 2d arr has the same databuffer as the 1d <code>arr.ravel()</code>; the shape and strides are different. And the distinction between <code>view</code>, <code>copy</code> and <code>item</code> still applies. A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. <code>numpy</code> takes a different, <code>strided</code> approach, with just one flat array of data, and uses<code>shape</code> and <code>strides</code> parameters to implement the transversal. So a subarray requires its own <code>shape</code> and <code>strides</code> as well as a pointer to the shared databuffer. <h3>1d array indexing</h3> I'll try to illustrate what is going on when you index an array: <pre class="prettyprint"><code>In [51]: arr = np.arange(4) </code></pre> The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with: <pre class="prettyprint"><code>In [52]: np.info(arr) class: ndarray shape: (4,) strides: (4,) itemsize: 4 aligned: True contiguous: True fortran: True data pointer: 0xa84f8d8 byteorder: little byteswap: False type: int32 </code></pre> or <pre class="prettyprint"><code>In [53]: arr.__array_interface__ Out[53]: {'data': (176486616, False), 'descr': [('', '<i4')], 'shape': (4,), 'strides': None, 'typestr': '<i4', 'version': 3} </code></pre> One has the data pointer in hex, the other decimal. We usually don't reference it directly. If I index an element, I get a new object: <pre class="prettyprint"><code>In [54]: x1 = arr[1] In [55]: type(x1) Out[55]: numpy.int32 In [56]: x1.__array_interface__ Out[56]: {'__ref': array(1), 'data': (181158400, False), ....} In [57]: id(x1) Out[57]: 2946170352 </code></pre> It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different. Make another selection from the same place - different id and different data: <pre class="prettyprint"><code>In [58]: x2 = arr[1] In [59]: id(x2) Out[59]: 2946170336 In [60]: x2.__array_interface__['data'] Out[60]: (181143288, False) </code></pre> Also if I change the array at this point, it does not affect the earlier selections: <pre class="prettyprint"><code>In [61]: arr[1] = 10 In [62]: arr Out[62]: array([ 0, 10, 2, 3]) In [63]: x1 Out[63]: 1 </code></pre> <code>x1</code> and <code>x2</code> don't have the same <code>id</code>, and thus won't match with <code>is</code>, and they don't use the <code>arr</code> data buffer either. There's no record that either variable was derived from <code>arr</code>. With <code>slicing</code> it is possible get a <code>view</code> of the original array, <pre class="prettyprint"><code>In [64]: y = arr[1:2] In [65]: y.__array_interface__ Out[65]: {'data': (176486620, False), 'descr': [('', '<i4')], 'shape': (1,), ....} In [66]: y Out[66]: array([10]) In [67]: y[0]=4 In [68]: arr Out[68]: array([0, 4, 2, 3]) In [69]: x1 Out[69]: 1 </code></pre> It's data pointer is 4 bytes larger than <code>arr</code> - that is, it points to the same buffer, just a different spot. And changing <code>y</code> does change <code>arr</code> (but not the independent <code>x1</code>). I could even make a 0d view of this item <pre class="prettyprint"><code>In [71]: z = y.reshape(()) In [72]: z Out[72]: array(4) In [73]: z[...]=0 In [74]: arr Out[74]: array([0, 0, 2, 3]) </code></pre> In Python code we normally don't work with objects like this. When we use the <code>c-api</code> or <code>cython</code> is it possible to access the data buffer directly. <code>nditer</code> is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In <code>cython</code> <code>typed memoryviews</code> are particularly useful for low level access. http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter <h3>elementwise ==</h3> In response to comment, Comparing NumPy object references <blockquote> np.array([1]) == np.array([2]) will return array([False], dtype=bool) </blockquote> <code>==</code> is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array. If such a comparison needs to be used in a scalar context (such as an <code>if</code>) it needs to be reduced to a single value, as with <code>np.all</code> or <code>np.any</code>. The <code>is</code> test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like <code>is None</code>, where <code>None</code> is an object with a unique id, and which does not play nicely with equality tests.

Comparing NumPy object references

Q: How do you check if two objects are identical in Python?

The is keyword is used to test if two variables refer to the same object. The test returns True if the two objects are the same object. The test returns False if they are not the same object, even if the two objects are 100% equal. Use the == operator to test if two variables are equal.

Q: How do I compare vectors in numpy?

To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.

Tags:

I want to understand the NumPy behavior.

When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False.

Here is the example:

In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False

While on the other hand, with Python native objects, the returned is True.

In [205]: c = [[1,2,3],[1]]    
In [206]: c0 = c[0]    
In [207]: c0 is c[0]
Out[207]: True

My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.

295

asked May 10 '17 06:05

nanangarsyad

1 Answers

2d slicing

When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so x[0] is a 'row', a slice of the original.

In [81]: arr = np.array([[1,2,3], [4,5,6]])
In [82]: arr.__array_interface__['data']
Out[82]: (181595128, False)

In [83]: x0 = arr[0,:]
In [84]: x0.__array_interface__['data']
Out[84]: (181595128, False)        # same databuffer pointer
In [85]: id(x0)
Out[85]: 2886887088
In [86]: x1 = arr[0,:]             # another slice, different id
In [87]: x1.__array_interface__['data']
Out[87]: (181595128, False)
In [88]: id(x1)
Out[88]: 2886888888

What I wrote earlier about slices still applies. Indexing an individual elements, as with arr[0,0] works the same as with a 1d array.

This 2d arr has the same databuffer as the 1d arr.ravel(); the shape and strides are different. And the distinction between view, copy and item still applies.

A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. numpy takes a different, strided approach, with just one flat array of data, and usesshape and strides parameters to implement the transversal. So a subarray requires its own shape and strides as well as a pointer to the shared databuffer.

1d array indexing

I'll try to illustrate what is going on when you index an array:

In [51]: arr = np.arange(4)

The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:

In [52]: np.info(arr)
class:  ndarray
shape:  (4,)
strides:  (4,)
itemsize:  4
aligned:  True
contiguous:  True
fortran:  True
data pointer: 0xa84f8d8
byteorder:  little
byteswap:  False
type: int32

In [53]: arr.__array_interface__
Out[53]: 
{'data': (176486616, False),
 'descr': [('', '<i4')],
 'shape': (4,),
 'strides': None,
 'typestr': '<i4',
 'version': 3}

One has the data pointer in hex, the other decimal. We usually don't reference it directly.

If I index an element, I get a new object:

In [54]: x1 = arr[1]
In [55]: type(x1)
Out[55]: numpy.int32
In [56]: x1.__array_interface__
Out[56]: 
{'__ref': array(1),
 'data': (181158400, False),
....}
In [57]: id(x1)
Out[57]: 2946170352

It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.

Make another selection from the same place - different id and different data:

In [58]: x2 = arr[1]
In [59]: id(x2)
Out[59]: 2946170336
In [60]: x2.__array_interface__['data']
Out[60]: (181143288, False)

Also if I change the array at this point, it does not affect the earlier selections:

In [61]: arr[1] = 10
In [62]: arr
Out[62]: array([ 0, 10,  2,  3])
In [63]: x1
Out[63]: 1

x1 and x2 don't have the same id, and thus won't match with is, and they don't use the arr data buffer either. There's no record that either variable was derived from arr.

With slicing it is possible get a view of the original array,

In [64]: y = arr[1:2]
In [65]: y.__array_interface__
Out[65]: 
{'data': (176486620, False),
 'descr': [('', '<i4')],
 'shape': (1,),
 ....}
In [66]: y
Out[66]: array([10])
In [67]: y[0]=4
In [68]: arr
Out[68]: array([0, 4, 2, 3])
In [69]: x1
Out[69]: 1

It's data pointer is 4 bytes larger than arr - that is, it points to the same buffer, just a different spot. And changing y does change arr (but not the independent x1).

I could even make a 0d view of this item

In [71]: z = y.reshape(())
In [72]: z
Out[72]: array(4)
In [73]: z[...]=0
In [74]: arr
Out[74]: array([0, 0, 2, 3])

In Python code we normally don't work with objects like this. When we use the c-api or cython is it possible to access the data buffer directly. nditer is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In cython typed memoryviews are particularly useful for low level access.

http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter

elementwise ==

In response to comment, Comparing NumPy object references

np.array([1]) == np.array([2]) will return array([False], dtype=bool)

== is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.

If such a comparison needs to be used in a scalar context (such as an if) it needs to be reduced to a single value, as with np.all or np.any.

The is test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like is None, where None is an object with a unique id, and which does not play nicely with equality tests.

134

answered Oct 13 '22 22:10

hpaulj

Related questions
                            
                                Renko Chart in R
                            
                                How do I check for garbage Observables in DevTools?
                            
                                NixOS: Install a non-nix package?
                            
                                How to return single object in django GET api (rest framework)
                            
                                Swagger-ui keeps showing example petstore instead of provided swagger.json
                            
                                How to stop ipfs daemon?(ctrl+c works only when you are inside daemon cmd prompt). Need solution to make it work from other cmd prompts as well
                            
                                Scope for multiple web apis
                            
                                Design pattern - Strategy and Bridge (Overlap in design)
                            
                                Record overwritten when using both MVC app and MS Access against MySQL
                            
                                Conditional emission delays with rxjs
                            
                                LINQ grouping multiple fields only if one of the fields is a specific value
                            
                                proper way of catching control+key in ncurses

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With