This isn't as much of a problem as a curiosity.
In my interpreter on 64 bit linux I can execute
In [10]: np.int64 == np.int64
Out[10]: True
In [11]: np.int64 is np.int64
Out[11]: True
Great, just what I would expect. However I found this weird property of the numpy.core.numeric module
In [19]: from numpy.core.numeric import _typelessdata
In [20]: _typelessdata
Out[20]: [numpy.int64, numpy.float64, numpy.complex128, numpy.int64]
Weird why is numpy.int64 in there twice? Lets investigate.
In [23]: _typelessdata[0] is _typelessdata[-1]
Out[23]: False
In [24]: _typelessdata[0] == _typelessdata[-1]
Out[24]: False
In [25]: id(_typelessdata[-1])
Out[25]: 139990931572128
In [26]: id(_typelessdata[0])
Out[26]: 139990931572544
In [27]: _typelessdata[-1]
Out[27]: numpy.int64
In [28]: _typelessdata[0]
Out[28]: numpy.int64
Whoah they are different. What is going on here? Why are there two np.int64's?
This is defined by the number of bytes in the integer ( int32 vs. int64 ), with more bytes holding larger numbers, as well as whether the number is signed or unsigned ( int32 vs. uint32 ), with unsigned being able to hold larger numbers but not able to hold negative number.
Numpy int64 (which is what int_ is on my machine) are integers represented by 8 bytes (64 bits), and anything over that cannot be represented.
The default data type: float_ . The 24 built-in array scalar type objects all convert to an associated data-type object.
int32. A 32-bit signed integer whose values exist on the interval [−2,147,483,647, +2,147,483,647] .
Here are the lines where _typelessdata
is constructed within numeric.py
:
_typelessdata = [int_, float_, complex_]
if issubclass(intc, int):
_typelessdata.append(intc)
if issubclass(longlong, int):
_typelessdata.append(longlong)
intc
is a C-compatible (32bit) signed integer, and int
is a native Python
integer, which may be either 32bit or 64bit depending on the platform.
On a 32bit system the native Python int
type is also 32bit, so
issubclass(intc, int)
returns True
and intc
gets appended to _typelessdata
,
which ends up looking like this:
[numpy.int32, numpy.float64, numpy.complex128, numpy.int32]
Note that _typelessdata[-1] is numpy.intc
, not numpy.int32
.
On a 64bit system, int
is 64bit, and therefore issubclass(longlong, int)
returns True
and a longlong
gets appended to _typelessdata
, resulting in:
[numpy.int64, numpy.float64, numpy.complex128, numpy.int64]
In this case, as Joe pointed out, (_typelessdata[-1] is numpy.longlong) == True
.
The bigger question is why the contents of _typelessdata
are set like this.
The only place I could find in the numpy source where _typelessdata
is
actually used is this line within the definition for np.array_repr
in the same file:
skipdtype = (arr.dtype.type in _typelessdata) and arr.size > 0
The purpose of _typelessdata
is to ensure that np.array_repr
correctly prints the string representation of arrays whose dtype
happens to be the same as the (platform-dependent) native Python integer type.
For example, on a 32bit system, where int
is 32bit:
In [1]: np.array_repr(np.intc([1]))
Out[1]: 'array([1])'
In [2]: np.array_repr(np.longlong([1]))
Out[2]: 'array([1], dtype=int64)'
whereas on a 64bit system, where int
is 64bit:
In [1]: np.array_repr(np.intc([1]))
Out[1]: 'array([1], dtype=int32)'
In [2]: np.array_repr(np.longlong([1]))
Out[2]: 'array([1])'
The arr.dtype.type in _typelessdata
check in the line above ensures that printing the dtype
is skipped for the appropriate platform-dependent native integer dtypes
.
I don't know the full history behind it, but the second int64
is actually numpy.longlong
.
In [1]: import numpy as np
In [2]: from numpy.core.numeric import _typelessdata
In [3]: _typelessdata
Out[4]: [numpy.int64, numpy.float64, numpy.complex128, numpy.int64]
In [5]: id(_typelessdata[-1]) == id(np.longlong)
Out[5]: True
numpy.longlong
is supposed to directly correspond to C's long long
type. C's long long
is specified to be at least 64 bits wide, but the exact definition is left up to the compiler.
My guess is that numpy.longlong
winds up being another instance of numpy.int64
on most systems, but is allowed to be something different if the C complier defines long long
as something wider than 64 bits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With