I need to pack some extra information into floating point NaN values. I am using single-precision IEEE 754 floats (32-bit floats) in Python. How do Python and NumPy treat these values? Theory The IEEE 754-2008 standard seems to think a number is really not a number, if the exponent bits (23..30) are set, and at least one of the significand bits is set. Thus if we transform the float into a 32-bit integer representation, anything satisfying the following conditions goes: <ul> <li><code>i & 0x7f800000 == 0x7f800000</code></li> <li><code>i & 0x007fffff != 0</code></li> </ul> This would leave me with plenty of choice. However, the standard seems to say that the highest bit of the significand is is_quiet and should be set to avoid exceptions in calculations. Practical tests Python 2.7 In order to be sure, I ran some tests with interesting results: <pre class="prettyprint"><code>import math import struct std_nan = struct.unpack("f4", struct.pack("I", 0x7fc00000))[0] spec_nan = struct.unpack("f4", struct.pack("I", 0x7f800001))[0] spec2_nan = struct.unpack("f4", struct.pack("I", 0x7fc00001))[0] print "{:08x}".format(struct.unpack("I", struct.pack("f4", std_nan))[0]) print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec_nan))[0]) print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec2_nan))[0]) </code></pre> This gives: <pre class="prettyprint"><code>7fc00000 7fc00001 <<< should be 7f800001 7fc00001 </code></pre> This and some further testing seems to suggst that something (<code>struct.unpack</code>?) always sets the is_quiet bit. NumPy I tried the same with NumPy, because there I can always rely on conversions not changing a single bit: <pre class="prettyprint"><code>import numpy as np intarr = np.array([0x7f800001], dtype='uint32') f = np.fromstring(intarr.tostring(), dtype='f4') print np.isnan(f) </code></pre> This gives: <pre class="prettyprint"><code>RuntimeWarning: invalid value encountered in isnan [True] </code></pre> but if the value is replaced by <code>0x7fc00001</code>, there is no error. Hypothesis Both Python and NumPy will be happy, if I set the is_quiet and use the rest of the bits for my own purposes. Python handles the bit by itself, NumPy relies on lower-level language implementations and/or the hardware FP implementation. Question Is my hypothesis correct, and can it be proved or disproved by some official documentation? Or is it one of those platform-dependent things? I found something quite related here: How to distinguish different types of NaN float in Python, but I could not find any official word on how extra-information-carrying NaNs should be handled in Python or NumPy.

After thinking of this for some time and having a look at the source code ad then rethinking a bit, I think I can answer my own question. My hypotheses are almost correct but not the whole story. As NumPy and Python handle numbers quite differently, this answer has two parts. What really happens in Python and NumPy with NaNs NumPy This may be slightly platform-specific, but on most platforms NumPy uses the <code>gcc</code> builtin <code>isnan</code>, which in turn does something fast. The runtime warnings come from the deeper levels, from the hardware in most cases. (NumPy may use on of several methods of determining the NaN status, such as x != x, which works on at least AMD 64 platforms, but with <code>gcc</code> it is down to <code>gcc</code>, which probably uses some pretty short code for the purpose.) So, in theory there is no way to guarantee how NumPy handles NaNs, but in practice on the more common platforms it will do as the standard says because that's what the hardware does. NumPy itself does not care about the NaN types at all. (Except for some NumPy-specific non-hw-supported data types and platforms.) Python Here the story becomes interesting. If the platform supports IEEE floats (most do), Python uses the C library for floating point arithmetics, and thus almost directly hardware instructions in most cases. So there should not be any difference to NumPy. Except for... There is usually no such thing as a 32-bit float in Python. Python float objects use C <code>double</code>, which is a 64-bit format. How does one transform special NaNs between these formats? In order to see what happens in practice, the following little C code helps: <pre class="prettyprint"><code>/* nantest.c - Test floating point nan behaviour with type casts */ #include <stdio.h> #include <stdint.h> static uint32_t u1 = 0x7fc00000; static uint32_t u2 = 0x7f800001; static uint32_t u3 = 0x7fc00001; int main(void) { float f1, f2, f3; float f1p, f2p, f3p; double d1, d2, d3; uint32_t u1p, u2p, u3p; uint64_t l1, l2, l3; // Convert uint32 -> float f1 = *(float *)&u1; f2 = *(float *)&u2; f3 = *(float *)&u3; // Convert float -> double (type cast, real conversion) d1 = (double)f1; d2 = (double)f2; d3 = (double)f3; // Convert the doubles into long ints l1 = *(uint64_t *)&d1; l2 = *(uint64_t *)&d2; l3 = *(uint64_t *)&d3; // Convert the doubles back to floats f1p = (float)d1; f2p = (float)d2; f3p = (float)d3; // Convert the floats back to uints u1p = *(uint32_t *)&f1p; u2p = *(uint32_t *)&f2p; u3p = *(uint32_t *)&f3p; printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f1, u1, d1, l1, f1p, u1p); printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f2, u2, d2, l2, f2p, u2p); printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f3, u3, d3, l3, f3p, u3p); return 0; } </code></pre> This prints: <pre class="prettyprint"><code>nan (7fc00000) -> nan (7ff8000000000000) -> nan (7fc00000) nan (7f800001) -> nan (7ff8000020000000) -> nan (7fc00001) nan (7fc00001) -> nan (7ff8000020000000) -> nan (7fc00001) </code></pre> By looking at row 2 it is obvious that we have the same phenomenon as we had with Python. So, it is the conversion to <code>double</code> that introduces the extra is_quiet bit immediately after the exponent in the 64-bit version. This sounds a bit strange, but actually the standard says (IEEE 754-2008, section 6.2.3): Conversion of a quiet NaN from a narrower format to a wider format in the same radix, and then back to the same narrower format, should not change the quiet NaN payload in any way except to make it canonical. This does not say anything about the propagation of signaled NaN's. However, that is explained by section 6.2.1.: For binary formats, the payload is encoded in the p − 2 least significant bits of the trailing significand field. The p above is precision, 24 bits for a 32-bit float. So, my mistake was to use signaled NaNs for payload. Summary I got the following take home points: <ul> <li>the use of qNaNs (quiet NaNs) is supported and encouraged by the IEEE 754-2008</li> <li>odd results were because I tried to use sNaNs and type conversions resulted in the is_quiet bit being set</li> <li>both NumPy and Python act according to IEEE 754 on the most common platforms</li> <li>the implementation leans heavily on the underlying C implementation and thus guarantees very little (there is even some code in Python which acknowledges that NaNs are not handled as they should be on some platforms)</li> <li>the only safe way to handle this is to do a bit of DIY with the payload</li> </ul> There is however, one thing that is implemented in neither Python nor NumPy (nor any other language I have come across with). Section 5.12.1: Language standards should provide an optional conversion of NaNs in a supported format to external character sequences which appends to the basic NaN character sequences a suffix that can represent the NaN payload (see 6.2). The form and interpretation of the payload suffix is language-defined. The language standard shall require that any such optional output sequences be accepted as input in conversion of external character sequences to supported formats.

Behaviour of custom NaN floats in Python and Numpy

Tags:

python

numpy

python-2.7

I need to pack some extra information into floating point NaN values. I am using single-precision IEEE 754 floats (32-bit floats) in Python. How do Python and NumPy treat these values?

Theory

The IEEE 754-2008 standard seems to think a number is really not a number, if the exponent bits (23..30) are set, and at least one of the significand bits is set. Thus if we transform the float into a 32-bit integer representation, anything satisfying the following conditions goes:

i & 0x7f800000 == 0x7f800000
i & 0x007fffff != 0

This would leave me with plenty of choice. However, the standard seems to say that the highest bit of the significand is is_quiet and should be set to avoid exceptions in calculations.

Practical tests

Python 2.7

In order to be sure, I ran some tests with interesting results:

import math
import struct

std_nan = struct.unpack("f4", struct.pack("I", 0x7fc00000))[0]
spec_nan = struct.unpack("f4", struct.pack("I", 0x7f800001))[0]
spec2_nan = struct.unpack("f4", struct.pack("I", 0x7fc00001))[0]

print "{:08x}".format(struct.unpack("I", struct.pack("f4", std_nan))[0])
print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec_nan))[0])
print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec2_nan))[0])

This gives:

7fc00000
7fc00001 <<< should be 7f800001
7fc00001

This and some further testing seems to suggst that something (struct.unpack?) always sets the is_quiet bit.

NumPy

I tried the same with NumPy, because there I can always rely on conversions not changing a single bit:

import numpy as np

intarr = np.array([0x7f800001], dtype='uint32')
f = np.fromstring(intarr.tostring(), dtype='f4')
print np.isnan(f)

This gives:

RuntimeWarning: invalid value encountered in isnan
[True]

but if the value is replaced by 0x7fc00001, there is no error.

Hypothesis

Both Python and NumPy will be happy, if I set the is_quiet and use the rest of the bits for my own purposes. Python handles the bit by itself, NumPy relies on lower-level language implementations and/or the hardware FP implementation.

Question

Is my hypothesis correct, and can it be proved or disproved by some official documentation? Or is it one of those platform-dependent things?

I found something quite related here: How to distinguish different types of NaN float in Python, but I could not find any official word on how extra-information-carrying NaNs should be handled in Python or NumPy.

252

asked Jul 06 '14 22:07

DrV

1 Answers

After thinking of this for some time and having a look at the source code ad then rethinking a bit, I think I can answer my own question. My hypotheses are almost correct but not the whole story.

As NumPy and Python handle numbers quite differently, this answer has two parts.

What really happens in Python and NumPy with NaNs

NumPy

This may be slightly platform-specific, but on most platforms NumPy uses the gcc builtin isnan, which in turn does something fast. The runtime warnings come from the deeper levels, from the hardware in most cases. (NumPy may use on of several methods of determining the NaN status, such as x != x, which works on at least AMD 64 platforms, but with gcc it is down to gcc, which probably uses some pretty short code for the purpose.)

So, in theory there is no way to guarantee how NumPy handles NaNs, but in practice on the more common platforms it will do as the standard says because that's what the hardware does. NumPy itself does not care about the NaN types at all. (Except for some NumPy-specific non-hw-supported data types and platforms.)

Python

Here the story becomes interesting. If the platform supports IEEE floats (most do), Python uses the C library for floating point arithmetics, and thus almost directly hardware instructions in most cases. So there should not be any difference to NumPy.

Except for... There is usually no such thing as a 32-bit float in Python. Python float objects use C double, which is a 64-bit format. How does one transform special NaNs between these formats? In order to see what happens in practice, the following little C code helps:

/* nantest.c - Test floating point nan behaviour with type casts */

#include <stdio.h>
#include <stdint.h>

static uint32_t u1 = 0x7fc00000;
static uint32_t u2 = 0x7f800001;
static uint32_t u3 = 0x7fc00001;

int main(void)
    {
    float f1, f2, f3;
    float f1p, f2p, f3p;
    double d1, d2, d3;
    uint32_t u1p, u2p, u3p;
    uint64_t l1, l2, l3;

    // Convert uint32 -> float
    f1 = *(float *)&u1; f2 = *(float *)&u2; f3 = *(float *)&u3;

    // Convert float -> double (type cast, real conversion)
    d1 = (double)f1; d2 = (double)f2; d3 = (double)f3;

    // Convert the doubles into long ints
    l1 = *(uint64_t *)&d1; l2 = *(uint64_t *)&d2; l3 = *(uint64_t *)&d3;

    // Convert the doubles back to floats
    f1p = (float)d1; f2p = (float)d2; f3p = (float)d3;

    // Convert the floats back to uints
    u1p = *(uint32_t *)&f1p; u2p = *(uint32_t *)&f2p; u3p = *(uint32_t *)&f3p;

    printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f1, u1, d1, l1, f1p, u1p);
    printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f2, u2, d2, l2, f2p, u2p);
    printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f3, u3, d3, l3, f3p, u3p);

    return 0;
    }

This prints:

nan (7fc00000) -> nan (7ff8000000000000) -> nan (7fc00000)
nan (7f800001) -> nan (7ff8000020000000) -> nan (7fc00001)
nan (7fc00001) -> nan (7ff8000020000000) -> nan (7fc00001)

By looking at row 2 it is obvious that we have the same phenomenon as we had with Python. So, it is the conversion to double that introduces the extra is_quiet bit immediately after the exponent in the 64-bit version.

This sounds a bit strange, but actually the standard says (IEEE 754-2008, section 6.2.3):

Conversion of a quiet NaN from a narrower format to a wider format in the same radix, and then back to the same narrower format, should not change the quiet NaN payload in any way except to make it canonical.

This does not say anything about the propagation of signaled NaN's. However, that is explained by section 6.2.1.:

For binary formats, the payload is encoded in the p − 2 least significant bits of the trailing significand field.

The p above is precision, 24 bits for a 32-bit float. So, my mistake was to use signaled NaNs for payload.

Summary

I got the following take home points:

the use of qNaNs (quiet NaNs) is supported and encouraged by the IEEE 754-2008
odd results were because I tried to use sNaNs and type conversions resulted in the is_quiet bit being set
both NumPy and Python act according to IEEE 754 on the most common platforms
the implementation leans heavily on the underlying C implementation and thus guarantees very little (there is even some code in Python which acknowledges that NaNs are not handled as they should be on some platforms)
the only safe way to handle this is to do a bit of DIY with the payload

There is however, one thing that is implemented in neither Python nor NumPy (nor any other language I have come across with). Section 5.12.1:

Language standards should provide an optional conversion of NaNs in a supported format to external character sequences which appends to the basic NaN character sequences a suffix that can represent the NaN payload (see 6.2). The form and interpretation of the payload suffix is language-defined. The language standard shall require that any such optional output sequences be accepted as input in conversion of external character sequences to supported formats.

145

answered Nov 04 '22 13:11

DrV

Related questions
                            
                                How can a function refer stably to itself?
                            
                                Better rounding in Python's NumPy.around: Rounding NumPy Arrays
                            
                                argparse: How can I allow multiple values to override a default
                            
                                Google Analytics API access without local browser in python
                            
                                Creating tuples of all possible combinations of items from two lists, without duplicating items within tuples
                            
                                predict_proba or decision_function as estimator "confidence"
                            
                                Repeatedly write to STDIN and read STDOUT of a Subprocess without closing it
                            
                                How do I debug a non-functioning PyInstaller build?
                            
                                Python os.pipe vs multiprocessing.Pipe
                            
                                When should I use underscores between words in Python function names (according to the style guide)?
                            
                                Python module and object names clash
                            
                                How to redirect stdout to a Tkinter Text widget
                            
                                Python is there an ORM for SQL and NoSQL
                            
                                Geometric warp of image in python
                            
                                In pandas is there something like a GroupBy.get_group, but with an optional default value?
                            
                                How to read from a text file compressed with 7z?
                            
                                Count how many matrices have full rank for all submatrices
                            
                                Mock a class in Python in order to call one of its methods
                            
                                Python in emacs: __name__ == '__main__', but somehow not
                            
                                How to unittest that a thread is spawned?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With