Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the dtype shown (even if it's the native one) when using floor division with NumPy?

Normally the dtype is hidden when it's equivalent to the native type:

>>> import numpy as np
>>> np.arange(5)
array([0, 1, 2, 3, 4])
>>> np.arange(5).dtype
dtype('int32')

>>> np.arange(5) + 3
array([3, 4, 5, 6, 7])

But somehow that doesn't apply to floor division or modulo:

>>> np.arange(5) // 3
array([0, 0, 0, 1, 1], dtype=int32)
>>> np.arange(5) % 3
array([0, 1, 2, 0, 1], dtype=int32)

Why is there a difference?

Python 3.5.4, NumPy 1.13.1, Windows 64bit

like image 621
MSeifert Avatar asked Sep 18 '17 17:09

MSeifert


People also ask

What does Dtype mean in NumPy?

A data type object (an instance of numpy. dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. It describes the following aspects of the data: Type of the data (integer, float, Python object, etc.)

How can you identify the datatype of a given NumPy array?

Creating numpy array by using an array function array(). This function takes argument dtype that allows us to define the expected data type of the array elements: Example 1: Python3.

Can NumPy arrays have different data types?

While a Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous.

How do I change the Dtype of a NumPy array?

In order to change the dtype of the given array object, we will use numpy. astype() function. The function takes an argument which is the target data type. The function supports all the generic types and built-in types of data.


2 Answers

It comes down to a difference in the dtype, as can be seen from the view:

In [186]: x = np.arange(10)
In [187]: y = x // 3
In [188]: x
Out[188]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [189]: y
Out[189]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3], dtype=int32)
In [190]: x.view(y.dtype)
Out[190]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [191]: y.view(x.dtype)
Out[191]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])

Even though the dtype descr are the same, there's some attribute that's different. But which?

In [192]: x.dtype.descr
Out[192]: [('', '<i4')]
In [193]: y.dtype.descr
Out[193]: [('', '<i4')]

In [204]: x.dtype.type
Out[204]: numpy.int32
In [205]: y.dtype.type
Out[205]: numpy.int32
In [207]: dtx.type is dty.type
Out[207]: False

In [243]: np.core.numeric._typelessdata
Out[243]: [numpy.int32, numpy.float64, numpy.complex128]
In [245]: x.dtype.type in np.core.numeric._typelessdata
Out[245]: True
In [246]: y.dtype.type in np.core.numeric._typelessdata
Out[246]: False

So ys dtype.type by all appearances is the same as xs, but it's a different object, with a different id:

In [261]: id(np.int32)
Out[261]: 3045777728
In [262]: id(x.dtype.type)
Out[262]: 3045777728
In [263]: id(y.dtype.type)
Out[263]: 3045777952
In [282]: id(np.intc)
Out[282]: 3045777952

Add this extra type to the list, and y no longer shows the dtype:

In [267]: np.core.numeric._typelessdata.append(y.dtype.type)
In [269]: y
Out[269]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])

So y.dtype.type is np.intc (and np.intp), while x.dtype.type is np.int32 (and np.int_).

So to make an array that displays the dtype, use np.intc.

In [23]: np.arange(10,dtype=np.int_)
Out[23]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [24]: np.arange(10,dtype=np.intc)
Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

And to turn this off, append np.intc to np.core.numeric._typelessdata.

like image 36
hpaulj Avatar answered Oct 12 '22 14:10

hpaulj


You actually have multiple distinct 32-bit integer dtypes here. This is probably a bug.

NumPy has (accidentally?) created multiple distinct signed 32-bit integer types, probably corresponding to C int and long. Both of them display as numpy.int32, but they're actually different objects. At C level, I believe the type objects are PyIntArrType_Type and PyLongArrType_Type, generated here.

dtype objects have a type attribute corresponding to the type object of scalars of that dtype. It is this type attribute that NumPy inspects when deciding whether to print dtype information in an array's repr:

_typelessdata = [int_, float_, complex_]
if issubclass(intc, int):
    _typelessdata.append(intc)


if issubclass(longlong, int):
    _typelessdata.append(longlong)

...

def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
    ...
    skipdtype = (arr.dtype.type in _typelessdata) and arr.size > 0

    if skipdtype:
        return "%s(%s)" % (class_name, lst)
    else:
        ...
        return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)

On numpy.arange(5) and numpy.arange(5) + 3, .dtype.type is numpy.int_; on numpy.arange(5) // 3 or numpy.arange(5) % 3, .dtype.type is the other 32-bit signed integer type.

As for why + and // have different output dtypes, they use different type resolution routines. Here's the one for //, and here's the one for +. //'s type resolution looks for a ufunc inner loop that takes types the inputs can be safely cast to, while +'s type resolution applies NumPy type promotion to the arguments and picks the loop matching the resulting type.

like image 106
user2357112 supports Monica Avatar answered Oct 12 '22 13:10

user2357112 supports Monica