I need to pack some extra information into floating point NaN values. I am using single-precision IEEE 754 floats (32-bit floats) in Python. How do Python and NumPy treat these values?
Theory
The IEEE 754-2008 standard seems to think a number is really not a number, if the exponent bits (23..30) are set, and at least one of the significand bits is set. Thus if we transform the float into a 32-bit integer representation, anything satisfying the following conditions goes:
i & 0x7f800000 == 0x7f800000
i & 0x007fffff != 0
This would leave me with plenty of choice. However, the standard seems to say that the highest bit of the significand is is_quiet and should be set to avoid exceptions in calculations.
Practical tests
Python 2.7
In order to be sure, I ran some tests with interesting results:
import math
import struct
std_nan = struct.unpack("f4", struct.pack("I", 0x7fc00000))[0]
spec_nan = struct.unpack("f4", struct.pack("I", 0x7f800001))[0]
spec2_nan = struct.unpack("f4", struct.pack("I", 0x7fc00001))[0]
print "{:08x}".format(struct.unpack("I", struct.pack("f4", std_nan))[0])
print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec_nan))[0])
print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec2_nan))[0])
This gives:
7fc00000
7fc00001 <<< should be 7f800001
7fc00001
This and some further testing seems to suggst that something (struct.unpack
?) always sets the is_quiet bit.
NumPy
I tried the same with NumPy, because there I can always rely on conversions not changing a single bit:
import numpy as np
intarr = np.array([0x7f800001], dtype='uint32')
f = np.fromstring(intarr.tostring(), dtype='f4')
print np.isnan(f)
This gives:
RuntimeWarning: invalid value encountered in isnan
[True]
but if the value is replaced by 0x7fc00001
, there is no error.
Hypothesis
Both Python and NumPy will be happy, if I set the is_quiet and use the rest of the bits for my own purposes. Python handles the bit by itself, NumPy relies on lower-level language implementations and/or the hardware FP implementation.
Question
Is my hypothesis correct, and can it be proved or disproved by some official documentation? Or is it one of those platform-dependent things?
I found something quite related here: How to distinguish different types of NaN float in Python, but I could not find any official word on how extra-information-carrying NaNs should be handled in Python or NumPy.
NaN stands for Not A Number and is a common missing data representation. It is a special floating-point value and cannot be converted to any other type than float.
In Python, NumPy with the latest version where nan is a value only for floating arrays only which stands for not a number and is a numeric data type which is used to represent an undefined value. In Python, NumPy defines NaN as a constant value.
No, you can't, at least with current version of NumPy. A nan is a special value for float arrays only.
After thinking of this for some time and having a look at the source code ad then rethinking a bit, I think I can answer my own question. My hypotheses are almost correct but not the whole story.
As NumPy and Python handle numbers quite differently, this answer has two parts.
What really happens in Python and NumPy with NaNs
NumPy
This may be slightly platform-specific, but on most platforms NumPy uses the gcc
builtin isnan
, which in turn does something fast. The runtime warnings come from the deeper levels, from the hardware in most cases. (NumPy may use on of several methods of determining the NaN status, such as x != x, which works on at least AMD 64 platforms, but with gcc
it is down to gcc
, which probably uses some pretty short code for the purpose.)
So, in theory there is no way to guarantee how NumPy handles NaNs, but in practice on the more common platforms it will do as the standard says because that's what the hardware does. NumPy itself does not care about the NaN types at all. (Except for some NumPy-specific non-hw-supported data types and platforms.)
Python
Here the story becomes interesting. If the platform supports IEEE floats (most do), Python uses the C library for floating point arithmetics, and thus almost directly hardware instructions in most cases. So there should not be any difference to NumPy.
Except for... There is usually no such thing as a 32-bit float in Python. Python float objects use C double
, which is a 64-bit format. How does one transform special NaNs between these formats? In order to see what happens in practice, the following little C code helps:
/* nantest.c - Test floating point nan behaviour with type casts */
#include <stdio.h>
#include <stdint.h>
static uint32_t u1 = 0x7fc00000;
static uint32_t u2 = 0x7f800001;
static uint32_t u3 = 0x7fc00001;
int main(void)
{
float f1, f2, f3;
float f1p, f2p, f3p;
double d1, d2, d3;
uint32_t u1p, u2p, u3p;
uint64_t l1, l2, l3;
// Convert uint32 -> float
f1 = *(float *)&u1; f2 = *(float *)&u2; f3 = *(float *)&u3;
// Convert float -> double (type cast, real conversion)
d1 = (double)f1; d2 = (double)f2; d3 = (double)f3;
// Convert the doubles into long ints
l1 = *(uint64_t *)&d1; l2 = *(uint64_t *)&d2; l3 = *(uint64_t *)&d3;
// Convert the doubles back to floats
f1p = (float)d1; f2p = (float)d2; f3p = (float)d3;
// Convert the floats back to uints
u1p = *(uint32_t *)&f1p; u2p = *(uint32_t *)&f2p; u3p = *(uint32_t *)&f3p;
printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f1, u1, d1, l1, f1p, u1p);
printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f2, u2, d2, l2, f2p, u2p);
printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f3, u3, d3, l3, f3p, u3p);
return 0;
}
This prints:
nan (7fc00000) -> nan (7ff8000000000000) -> nan (7fc00000)
nan (7f800001) -> nan (7ff8000020000000) -> nan (7fc00001)
nan (7fc00001) -> nan (7ff8000020000000) -> nan (7fc00001)
By looking at row 2 it is obvious that we have the same phenomenon as we had with Python. So, it is the conversion to double
that introduces the extra is_quiet bit immediately after the exponent in the 64-bit version.
This sounds a bit strange, but actually the standard says (IEEE 754-2008, section 6.2.3):
Conversion of a quiet NaN from a narrower format to a wider format in the same radix, and then back to the same narrower format, should not change the quiet NaN payload in any way except to make it canonical.
This does not say anything about the propagation of signaled NaN's. However, that is explained by section 6.2.1.:
For binary formats, the payload is encoded in the p − 2 least significant bits of the trailing significand field.
The p above is precision, 24 bits for a 32-bit float. So, my mistake was to use signaled NaNs for payload.
Summary
I got the following take home points:
There is however, one thing that is implemented in neither Python nor NumPy (nor any other language I have come across with). Section 5.12.1:
Language standards should provide an optional conversion of NaNs in a supported format to external character sequences which appends to the basic NaN character sequences a suffix that can represent the NaN payload (see 6.2). The form and interpretation of the payload suffix is language-defined. The language standard shall require that any such optional output sequences be accepted as input in conversion of external character sequences to supported formats.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With