In numpy, why does subtraction of integers sometimes produce floating point numbers?
>>> x = np.int64(2) - np.uint64(1)
>>> x
1.0
>>> x.dtype
dtype('float64')
This seems to only occur when using multiple different integer types (e.g. signed and unsigned), and when no larger integer type is available.
float64' object cannot be interpreted as an integer.
A Quick Introduction to Numpy Subtract When you use np. subtract on two same-sized Numpy arrays, the function will subtract the elements of the second array from the elements of the first array. It performs this subtraction in an “element-wise” fashion.
The most straightforward way to subtract two matrices in NumPy is by using the - operator, which is the simplification of the np. subtract() method - NumPy specific method designed for subtracting arrays and other array-like objects such as matrices.
This is a conscious design decision by the numpy
authors. When deciding on the resulting type, only the types of the operands are considered, not their actual values. And for the operation you perform, there is a risk of having a result outside the valid range, e.g. if you subtract a very large uint64
number, the result would not fit in an int64
. The safe selection is thus to convert to float64
, which certainly will fit the result (possibly with reduced precision, though).
Compare with an example of x = np.int32(2) - np.uint32(1)
. This can always be safely represented as an int64
, therefore that type is chosen. The same would be true for x = np.int64(2) - np.uint32(1)
. This will also yield an int64
.
The alternative would be to follow e.g. the c rules, which would cast everything to uint64
. But that could, of course, lead to very strange results with over/underflows.
If you want to know ahead of time what type you will end up with, look into np.result_type()
, np.can_cast()
, and np.promote_types()
. Reading about this in the docs might also help you understand the issue a bit better.
I'm no expert on numpy, however, I suspect that since float64
is the smallest data type that can fit both the domain of int64
and uint64
that the subtraction converts both operands into a float64
so that the operation always succeeds.
For example, in a with int8
and uint8
: +128 - (256)
cannot fit in a int8
since -128
is not valid in int8
, as it can only fit back to -127
. Similarly, we can't use a uint8
since we obviously need the sign in this case. Hence, we settle on a float/double as it can fit both directions fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With