Can someone explain the Numpy design decision to keep single elements of arrays as distinct from Python scalars?
The following code works without errors
import numpy as np
a = np.array([1, 2, 3])
b = a[0]
print(b.size)
This illustrates that b
is not a simple Python scalar, and in fact type(b)
gives numpy.int32
instead of int
.
Of course, if one defines b = 1
, the command b.size
throws an error because
AttributeError: 'int' object has no attribute 'size'
I find this difference of behaviour confusing and I am wondering what is its motivation.
There is a difference between elements of an array and the object you get when indexing one.
The array has a data buffer. It is a block of bytes the numpy manages with its own compiled code. Individual elements may be represented by 1 byte, 4, 8, 16, etc.
In [478]: A=np.array([1,2,3])
In [479]: A.__array_interface__
Out[479]:
{'data': (167487856, False),
'descr': [('', '<i4')],
'shape': (3,),
'strides': None,
'typestr': '<i4',
'version': 3}
view the data as a list of bytes (displayed as characters):
In [480]: A.view('S1')
Out[480]:
array(['\x01', '', '', '', '\x02', '', '', '', '\x03', '', '', ''],
dtype='|S1')
When you select an element of A
you get back a one element array (or something like it):
In [491]: b=A[0]
In [492]: b.shape
Out[492]: ()
In [493]: b.__array_interface__
Out[493]:
{'__ref': array(1),
'data': (167480104, False),
'descr': [('', '<i4')],
'shape': (),
'strides': None,
'typestr': '<i4',
'version': 3}
The type
is different, but b
has most of the same attributes as A
, shape
, strides
, mean
, etc.
You have to use .item
to access the underlying 'scalar':
In [496]: b.item()
Out[496]: 1
In [497]: type(b.item())
Out[497]: int
So you can think of b
as a scalar with a numpy
wrapper. The __array_interface__
for b
looks very much like that of np.array(1)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With