Can someone explain the Numpy design decision to keep single elements of arrays as distinct from Python scalars?
The following code works without errors
import numpy as np
a = np.array([1, 2, 3])
b = a[0]
print(b.size)
This illustrates that b is not a simple Python scalar, and in fact type(b) gives numpy.int32 instead of int.
Of course, if one defines b = 1, the command b.size throws an error because 
AttributeError: 'int' object has no attribute 'size'
I find this difference of behaviour confusing and I am wondering what is its motivation.
There is a difference between elements of an array and the object you get when indexing one.
The array has a data buffer. It is a block of bytes the numpy manages with its own compiled code. Individual elements may be represented by 1 byte, 4, 8, 16, etc.
In [478]: A=np.array([1,2,3])
In [479]: A.__array_interface__
Out[479]: 
{'data': (167487856, False),
 'descr': [('', '<i4')],
 'shape': (3,),
 'strides': None,
 'typestr': '<i4',
 'version': 3}
view the data as a list of bytes (displayed as characters):
In [480]: A.view('S1')
Out[480]: 
array(['\x01', '', '', '', '\x02', '', '', '', '\x03', '', '', ''], 
      dtype='|S1')
When you select an element of A you get back a one element array (or something like it):
In [491]: b=A[0]
In [492]: b.shape
Out[492]: ()
In [493]: b.__array_interface__
Out[493]: 
{'__ref': array(1),
 'data': (167480104, False),
 'descr': [('', '<i4')],
 'shape': (),
 'strides': None,
 'typestr': '<i4',
 'version': 3}
The type is different, but b has most of the same attributes as A, shape, strides, mean, etc.
You have to use .item to access the underlying 'scalar':
In [496]: b.item()
Out[496]: 1
In [497]: type(b.item())
Out[497]: int
So you can think of b as a scalar with a numpy wrapper.  The __array_interface__ for b looks very much like that of np.array(1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With