Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy data type comparison

I was playing with comparing data types of two different arrays to pick one that is suitable for combining the two. I was happy to discover that I could perform comparison operations, but in the process discovered the following strange behavior:

In [1]: numpy.int16 > numpy.float32
Out[1]: True

In [2]: numpy.dtype('int16') > numpy.dtype('float32')
Out[2]: False

Can anyone explain what is going on here? This is NumPy 1.8.2.

like image 235
farenorth Avatar asked Apr 17 '15 21:04

farenorth


2 Answers

The first comparison is not meaningful, the second is meaningful.

With numpy.int16 > numpy.float32 we are comparing two type objects:

>>> type(numpy.int16)
type
>>> numpy.int16 > numpy.float32 # I'm using Python 3
TypeError: unorderable types: type() > type()

In Python 3 this comparison fails immediately since there is no defined ordering for type instances. In Python 2, a boolean is returned but cannot be relied upon for consistency (it falls back to comparing memory addresses or other implementation-level stuff).

The second comparison does work in Python 3, and it works consistently (same in Python 2). This is because we're now comparing dtype instances:

>>> type(numpy.dtype('int16'))
numpy.dtype
>>> numpy.dtype('int16') > numpy.dtype('float32')
False
>>> numpy.dtype('int32') < numpy.dtype('|S10')
False
>>> numpy.dtype('int32') < numpy.dtype('|S11')
True

What's the logic behind this ordering?

dtype instances are ordered according to whether one can be cast (safely) to another. One type is less than another if it can be safely cast to that type.

For the implementation of the comparison operators, look at descriptor.c; specifically at the arraydescr_richcompare function.

Here's what the < operator maps to:

switch (cmp_op) {
 case Py_LT:
        if (!PyArray_EquivTypes(self, new) && PyArray_CanCastTo(self, new)) {
            result = Py_True;
        }
        else {
            result = Py_False;
        }
        break;

Essentially, NumPy just checks that the two types are (i) not equivalent, and (ii) that the first type can be cast to the second type.

This functionality is also exposed in the NumPy API as np.can_cast:

>>> np.can_cast('int32', '|S10')
False
>>> np.can_cast('int32', '|S11')
True
like image 100
Alex Riley Avatar answered Oct 06 '22 07:10

Alex Riley


It's nothing interesting. Python 2 tries to provide consistent but meaningless comparison results for objects that don't define how to compare themselves with each other. The developers decided that was a mistake, and in Python 3, these comparisons will raise a TypeError.

like image 21
user2357112 supports Monica Avatar answered Oct 06 '22 05:10

user2357112 supports Monica