I'm interested in minimising the size of a protobuf message serialised from Python. Protobuf has floats (4 bytes) and doubles (8 bytes). Python has a float type that's actually a C double, at least in CPython. My question is: given an instance of a Python <code>float</code>, is there a "fast" way of checking if the value would lose precision if it was assigned to a protobuf <code>float</code> (or really a C++ float) ?

For completeness, here is the "round tripping through struct" method mentioned in the comments, which has the benefit of not requiring numpy but still giving accurate results: <pre class="prettyprint"><code>import struct, math def is_single_precision_struct(x, _s=struct.Struct("f")): return math.isnan(x) or _s.unpack(_s.pack(x))[0] == x </code></pre> Time comparison against <code>is_single_precision_numpy()</code>: <ul> <li>is_single_precision_numpy(f): [2.5650789737701416, 2.5488431453704834, 2.551704168319702]</li> <li>is_single_precision_struct(f): [0.3972139358520508, 0.39684605598449707, 0.39119601249694824]</li> </ul> So it also seems to be faster on my machine.

If you want a simple solution that covers almost all corner cases, and will correctly detect out-of-range exponents as well as loss of information from the smaller precision, you can use NumPy to convert your potential float into an <code>np.float32</code> object, then compare with the original: <pre class="prettyprint"><code>import numpy def is_single_precision_numpy(floatval, _float32=np.float32): return _float32(floatval) == floatval </code></pre> This automatically takes care of potentially problematic cases like values that are in the <code>float32</code> subnormal range. For example: <pre class="prettyprint"><code>>>> is_single_precision_numpy(float.fromhex('0x13p-149')) True >>> is_single_precision_numpy(float.fromhex('0x13.8p-149')) False </code></pre> Those cases are harder to deal with easily with the <code>hex</code>-based solution. While not as fast as @Martijn Pieters' regex-based solution, the speed is still respectable (about half as fast as the regex-based solution). Here are timings (where <code>is_single_precision_re_hex</code> is exactly the version from Martijn's answer). <pre class="prettyprint"><code>>>> timeit.Timer('is_single_precision_numpy(f)', 'f = 1.2345678901e+26; from __main__ import is_single_precision_numpy').repeat(3, 10**6) [2.035495020012604, 2.0115931580075994, 2.013475093001034] >>> timeit.Timer('is_single_precision_re_hex(f)', 'f = 1.2345678901e+26; from __main__ import is_single_precision_re_hex').repeat(3, 10**6) [1.1169273109990172, 1.1178153319924604, 1.1184561859990936] </code></pre> Unfortunately, while almost all corner cases (subnormals, infinities, signed zeros, overflows, etc.) are handled correctly, there's one corner case that this solution won't work for: the case that <code>floatval</code> is a NaN. In that case, <code>is_single_precision_numpy</code> will return <code>False</code>. That may or may not matter for your needs. If it does matter, then adding an extra <code>isnan</code> check should do the trick: <pre class="prettyprint"><code>import math def is_single_precision_numpy(floatval, _float32=np.float32, _isnan=math.isnan): return _float32(floatval) == floatval or _isnan(floatval) </code></pre>

When would a Python float lose precision when cast to Protobuf/C++ float?

3 Answers

You can check convert the float to a hex representation; the sign, exponent and fraction each get a separate section. Provided the fraction uses only the first 6 hex digits (the remaining 7 digits must be zero), and the 6th digit is even (so the last bit is not set) will your 64-bit double float fit in a 32-bit single. The exponent is limited to a value between -126 and 127:

import math
import re

def is_single_precision(
        f,
        _isfinite=math.isfinite,
        _singlepat=re.compile(
            r'-?0x[01]\.[0-9a-f]{5}[02468ace]0{7}p'
            r'(?:\+(?:1[01]\d|12[0-7]|[1-9]\d|\d)|'
            r'-(?:1[01]\d|12[0-6]|[1-9]\d|\d))$').match):
    return not _isfinite(f) or _singlepat(f.hex()) is not None or f == 0.0

The float.hex() method is quite fast, faster than roundtripping via struct or numpy; you can create 1 million hex representations in under half a second:

>>> timeit.Timer('(1.2345678901e+26).hex()').autorange()
(1000000, 0.47934128501219675)

The regex engine is also pretty fast, and with name lookups optimised in the function above we can test 1 million float values in about 1.1 seconds:

>>> import random, sys
>>> testvalues = [0.0, float('inf'), float('-inf'), float('nan')] + [random.uniform(sys.float_info.min, sys.float_info.max) for _ in range(2 * 10 ** 6)]
>>> timeit.Timer('is_single_precision(f())', 'from __main__ import is_single_precision, testvalues; f = iter(testvalues).__next__').autorange()
(1000000, 1.1044921400025487)

The above works because the binary32 format for floats allots 23 bits for the fraction. The exponent is allotted 8 bits (signed). The regex only allows for the first 23 bits to be set, and the exponent to be within the range for a signed 8-bit number.

Also see

IEEE 754 single-precision binary floating-point format: binary32
IEEE 754 double-precision binary floating-point format: binary64

This may not be what you want however! Take for example 1/3rd or 1/10th. Both are values which require approximation in floating point values, and both fail the test:

>>> (1/3).hex()
'0x1.5555555555555p-2'
>>> (1/10).hex()
'0x1.999999999999ap-4'

You may have to instead take a heuristic approach; if your hex value has all zeros in the first 6 digits of the fraction, or an exponent outside of the (-126, 127) range, converting to double would lead to too much loss.

141

answered Nov 14 '22 22:11

Martijn Pieters

For completeness, here is the "round tripping through struct" method mentioned in the comments, which has the benefit of not requiring numpy but still giving accurate results:

import struct, math
def is_single_precision_struct(x, _s=struct.Struct("f")):
    return math.isnan(x) or _s.unpack(_s.pack(x))[0] == x

Time comparison against is_single_precision_numpy():

is_single_precision_numpy(f): [2.5650789737701416, 2.5488431453704834, 2.551704168319702]
is_single_precision_struct(f): [0.3972139358520508, 0.39684605598449707, 0.39119601249694824]

So it also seems to be faster on my machine.

answered Nov 14 '22 22:11

jpa

If you want a simple solution that covers almost all corner cases, and will correctly detect out-of-range exponents as well as loss of information from the smaller precision, you can use NumPy to convert your potential float into an np.float32 object, then compare with the original:

import numpy

def is_single_precision_numpy(floatval, _float32=np.float32):
    return _float32(floatval) == floatval

This automatically takes care of potentially problematic cases like values that are in the float32 subnormal range. For example:

>>> is_single_precision_numpy(float.fromhex('0x13p-149'))
True
>>> is_single_precision_numpy(float.fromhex('0x13.8p-149'))
False

Those cases are harder to deal with easily with the hex-based solution.

While not as fast as @Martijn Pieters' regex-based solution, the speed is still respectable (about half as fast as the regex-based solution). Here are timings (where is_single_precision_re_hex is exactly the version from Martijn's answer).

>>> timeit.Timer('is_single_precision_numpy(f)', 'f = 1.2345678901e+26; from __main__ import is_single_precision_numpy').repeat(3, 10**6)
[2.035495020012604, 2.0115931580075994, 2.013475093001034]
>>> timeit.Timer('is_single_precision_re_hex(f)', 'f = 1.2345678901e+26; from __main__ import is_single_precision_re_hex').repeat(3, 10**6)
[1.1169273109990172, 1.1178153319924604, 1.1184561859990936]

Unfortunately, while almost all corner cases (subnormals, infinities, signed zeros, overflows, etc.) are handled correctly, there's one corner case that this solution won't work for: the case that floatval is a NaN. In that case, is_single_precision_numpy will return False. That may or may not matter for your needs. If it does matter, then adding an extra isnan check should do the trick:

import math

def is_single_precision_numpy(floatval, _float32=np.float32, _isnan=math.isnan):
    return _float32(floatval) == floatval or _isnan(floatval)

answered Nov 14 '22 23:11

Mark Dickinson

Related questions
                            
                                Is there a way to tell which kernel a jupyter notebook was built with?
                            
                                Creating a pivot table in pandas and grouping at the same time the dates per week
                            
                                How to unpack key,value pairs in python? [duplicate]
                            
                                Jupyter: How to update plot on button click (ipywidgets)
                            
                                Heatmap in python to represent (x,y) coordinates in a given rectangular area
                            
                                HTML Logger in python
                            
                                run multi command in the same jupyter cells
                            
                                Writing single CSV header with pandas
                            
                                Import static method of a class without importing the whole class [duplicate]
                            
                                PySpark - Create DataFrame from Numpy Matrix
                            
                                How to match and align two images using SURF features (Python OpenCV )?
                            
                                Python - Regex - from Xpath - TypeError: '_sre.SRE_Match' object is not subscriptable
                            
                                Using a Keras model inside a TF estimator
                            
                                Check if given input is a valid IP or Hostname or something invalid
                            
                                Replace dataframe column negative values with nan, in method chain
                            
                                Where can I find a list of all available ChromeOptions with selenium?
                            
                                Pybind11 for C++ code with inner struct created via static factory method
                            
                                Python 3, Ethereum - how to send ERC20 Tokens?
                            
                                delete_message_batch doesn't really delete messages from SQS queue
                            
                                How to set the color of the circle and the selection dot of a radio button?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When would a Python float lose precision when cast to Protobuf/C++ float?

Tags:

python

protocol-buffers

MarkNS

People also ask

3 Answers

Martijn Pieters

jpa

Mark Dickinson

Recent Activity

Donate For Us