Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python numpy data pointer addresses change without modification

EDIT

After some more fiddling around, I've so far isolated the following states:

  1. A 1D array gives two different addresses when entering variable directly, and only one when using print()
  2. A 2D array (or matrix) gives three different addresses when entering variable directly, and two when using print()
  3. A 3D array gives two different address when entering variable directly, and only one when using print() (apparently the same as with the 1D array)

Like so:

>>> a = numpy.array([1,2,3], dtype="int32")

>>> a.data
<memory at 0x7f02e85e4048>
>>> a.data
<memory at 0x7f02e85e4110>
>>> a.data
<memory at 0x7f02e85e4048>
>>> a.data
<memory at 0x7f02e85e4110>
>>> a.data
<memory at 0x7f02e85e4048>

>>> print(a.data)
<memory at 0x7f02e85e4110>
>>> print(a.data)
<memory at 0x7f02e85e4110>
>>> print(a.data)
<memory at 0x7f02e85e4110>
>>> print(a.data)
<memory at 0x7f02e85e4110>
>>> print(a.data)
<memory at 0x7f02e85e4110>


>>> d = numpy.array([[1,2,3]], dtype="int32")

>>> d.data
<memory at 0x7f02e863ae48>
>>> d.data
<memory at 0x7f02e863a9e8>
>>> d.data
<memory at 0x7f02e863aac8>
>>> d.data
<memory at 0x7f02e863ae48>
>>> d.data
<memory at 0x7f02e863a9e8>
>>> d.data
<memory at 0x7f02e863aac8>

>>> print(d.data)
<memory at 0x7f02e863ae48>
>>> print(d.data)
<memory at 0x7f02e863a9e8>
>>> print(d.data)
<memory at 0x7f02e863ae48>
>>> print(d.data)
<memory at 0x7f02e863a9e8>
>>> print(d.data)
<memory at 0x7f02e863ae48>


>>> b = numpy.matrix([[1,2,3],[4,5,6]], dtype="int32")

>>> b.data
<memory at 0x7f02e863a9e8>
>>> b.data
<memory at 0x7f02e863ae48>
>>> b.data
<memory at 0x7f02e863aac8>
>>> b.data
<memory at 0x7f02e863a9e8>
>>> b.data
<memory at 0x7f02e863ae48>

>>> print(b.data)
<memory at 0x7f02e863aac8>
>>> print(b.data)
<memory at 0x7f02e863a9e8>
>>> print(b.data)
<memory at 0x7f02e863aac8>
>>> print(b.data)
<memory at 0x7f02e863a9e8>
>>> print(b.data)
<memory at 0x7f02e863aac8>


>>> c = numpy.matrix([[1,2,3],[4,5,6],[7,8,9]], dtype="int32")

>>> c.data
<memory at 0x7f02e863aac8>
>>> c.data
<memory at 0x7f02e863a9e8>
>>> c.data
<memory at 0x7f02e863ae48>
>>> c.data
<memory at 0x7f02e863aac8>
>>> c.data
<memory at 0x7f02e863ae48>
>>> c.data
<memory at 0x7f02e863a9e8>
>>> c.data
<memory at 0x7f02e863aac8>

>>> print(c.data)
<memory at 0x7f02e863ae48>
>>> print(c.data)
<memory at 0x7f02e863a9e8>
>>> print(c.data)
<memory at 0x7f02e863ae48>
>>> print(c.data)
<memory at 0x7f02e863a9e8>
>>> print(c.data)
<memory at 0x7f02e863ae48>


>>> e = numpy.array([[[0,1],[2,3]],[[4,5],[6,7]]], dtype="int32")

>>> e.data
<memory at 0x7f8ca0fe1048>
>>> e.data
<memory at 0x7f8ca0fe1140>
>>> e.data
<memory at 0x7f8ca0fe1048>
>>> e.data
<memory at 0x7f8ca0fe1140>
>>> e.data
<memory at 0x7f8ca0fe1048>


>>> print(e.data)
<memory at 0x7f8ca0fe1048>
>>> print(e.data)
<memory at 0x7f8ca0fe1048>
>>> print(e.data)
<memory at 0x7f8ca0fe1048>

ORIGINAL POST

I was under the impression that merely entering a variable along in the python console with echo a string simply describing the value (and type) of it. It formats in a different manner than print(), but I assumed the values they both returned would be the same.

When I try to output the address of the data pointer object of a numpy object, just entering the variable gives me different value every other time, while print() gives the same value.

That suggests that the difference in the two operations aren't just how the output is formatted, but also where they get their information from. But what exactly do these additional differences consist of?

>>> a = numpy.array([0,1,2])

>>> a
array([0, 1, 2])
>>> print(a)
[0 1 2]

>>> print(a.data)
<memory at 0x7ff25120c110>
>>> print(a.data)
<memory at 0x7ff25120c110>
>>> print(a.data)
<memory at 0x7ff25120c110>

>>> a.data
<memory at 0x7ff25120c110>
>>> a.data
<memory at 0x7ff253099818>
>>> a.data
<memory at 0x7ff25120c110>
>>> a.data
<memory at 0x7ff253099818>
>>> a.data
<memory at 0x7ff25120c110>
like image 205
lash Avatar asked Oct 19 '22 03:10

lash


1 Answers

The memoryview returned by a.data seems to alternate between two (or more) views. If you store a given instance of a.data, you get consistent output:

>>> a.data
<memory at 0x7fb88ea1f828>
>>> a.data
<memory at 0x7fb88e98c4a8>
>>> t = a.data
>>> a.data
<memory at 0x7fb88e98ce48>
>>> a.data
<memory at 0x7fb88e98c3c8>
>>> a.data
<memory at 0x7fb88e98c4a8>
>>> a.data
<memory at 0x7fb88e98ce48>
>>> a.data
<memory at 0x7fb88e98c3c8>
>>> a.data
<memory at 0x7fb88e98c4a8>
>>> t
<memory at 0x7fb88ea1f828>
>>> t
<memory at 0x7fb88ea1f828>
>>> t
<memory at 0x7fb88ea1f828>

Note that there are 3 addresses rotating in the above example; I'm pretty sure this is all an implementation detail. I would guess that some caching is involved, implying that a new view is not actually generated each time you access a.data.

You can also make certain that you are looking at separate view objects:

>>> id(a.data)
140430643088968
>>> id(a.data)
140430643086280
>>> id(a.data)
140430643088968
>>> id(a.data)
140430643086280

So most of the confusion probably comes from the fact that the attribute notation of a.data would suggest that it's a fixed object we're talking about, while this is not the case.

like image 181