Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are there two np.int64s in numpy.core.numeric._typelessdata (Why is numpy.int64 not numpy.int64?)

Tags:

python

numpy

This isn't as much of a problem as a curiosity.

In my interpreter on 64 bit linux I can execute

In [10]: np.int64 == np.int64
Out[10]: True

In [11]: np.int64 is np.int64
Out[11]: True

Great, just what I would expect. However I found this weird property of the numpy.core.numeric module

In [19]: from numpy.core.numeric import _typelessdata

In [20]: _typelessdata
Out[20]: [numpy.int64, numpy.float64, numpy.complex128, numpy.int64]

Weird why is numpy.int64 in there twice? Lets investigate.

In [23]: _typelessdata[0] is _typelessdata[-1]
Out[23]: False
In [24]: _typelessdata[0] == _typelessdata[-1]
Out[24]: False
In [25]: id(_typelessdata[-1])
Out[25]: 139990931572128
In [26]: id(_typelessdata[0])
Out[26]: 139990931572544
In [27]: _typelessdata[-1]
Out[27]: numpy.int64
In [28]: _typelessdata[0]
Out[28]: numpy.int64

Whoah they are different. What is going on here? Why are there two np.int64's?

like image 488
Erotemic Avatar asked Feb 11 '15 13:02

Erotemic


People also ask

What is the difference between int64 and int in Python?

This is defined by the number of bytes in the integer ( int32 vs. int64 ), with more bytes holding larger numbers, as well as whether the number is signed or unsigned ( int32 vs. uint32 ), with unsigned being able to hold larger numbers but not able to hold negative number.

Is Numpy int64 an int?

Numpy int64 (which is what int_ is on my machine) are integers represented by 8 bytes (64 bits), and anything over that cannot be represented.

What is the default data type of Numpy array?

The default data type: float_ . The 24 built-in array scalar type objects all convert to an associated data-type object.

What is int32 Python?

int32. A 32-bit signed integer whose values exist on the interval [−2,147,483,647, +2,147,483,647] .


2 Answers

Here are the lines where _typelessdata is constructed within numeric.py:

_typelessdata = [int_, float_, complex_]
if issubclass(intc, int):
    _typelessdata.append(intc)

if issubclass(longlong, int):
    _typelessdata.append(longlong)

intc is a C-compatible (32bit) signed integer, and int is a native Python integer, which may be either 32bit or 64bit depending on the platform.

  • On a 32bit system the native Python int type is also 32bit, so issubclass(intc, int) returns True and intc gets appended to _typelessdata, which ends up looking like this:

    [numpy.int32, numpy.float64, numpy.complex128, numpy.int32]
    

    Note that _typelessdata[-1] is numpy.intc, not numpy.int32.

  • On a 64bit system, int is 64bit, and therefore issubclass(longlong, int) returns True and a longlong gets appended to _typelessdata, resulting in:

    [numpy.int64, numpy.float64, numpy.complex128, numpy.int64]
    

    In this case, as Joe pointed out, (_typelessdata[-1] is numpy.longlong) == True.


The bigger question is why the contents of _typelessdata are set like this. The only place I could find in the numpy source where _typelessdata is actually used is this line within the definition for np.array_repr in the same file:

skipdtype = (arr.dtype.type in _typelessdata) and arr.size > 0

The purpose of _typelessdata is to ensure that np.array_repr correctly prints the string representation of arrays whose dtype happens to be the same as the (platform-dependent) native Python integer type.

For example, on a 32bit system, where int is 32bit:

In [1]: np.array_repr(np.intc([1]))
Out[1]: 'array([1])'

In [2]: np.array_repr(np.longlong([1]))
Out[2]: 'array([1], dtype=int64)'

whereas on a 64bit system, where int is 64bit:

In [1]: np.array_repr(np.intc([1]))
Out[1]: 'array([1], dtype=int32)'

In [2]: np.array_repr(np.longlong([1]))
Out[2]: 'array([1])'

The arr.dtype.type in _typelessdata check in the line above ensures that printing the dtype is skipped for the appropriate platform-dependent native integer dtypes.

like image 200
ali_m Avatar answered Oct 23 '22 23:10

ali_m


I don't know the full history behind it, but the second int64 is actually numpy.longlong.

In [1]: import numpy as np

In [2]: from numpy.core.numeric import _typelessdata

In [3]: _typelessdata
Out[4]: [numpy.int64, numpy.float64, numpy.complex128, numpy.int64]

In [5]: id(_typelessdata[-1]) == id(np.longlong)
Out[5]: True

numpy.longlong is supposed to directly correspond to C's long long type. C's long long is specified to be at least 64 bits wide, but the exact definition is left up to the compiler.

My guess is that numpy.longlong winds up being another instance of numpy.int64 on most systems, but is allowed to be something different if the C complier defines long long as something wider than 64 bits.

like image 1
Joe Kington Avatar answered Oct 24 '22 00:10

Joe Kington