Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When would a Python float lose precision when cast to Protobuf/C++ float?

I'm interested in minimising the size of a protobuf message serialised from Python.

Protobuf has floats (4 bytes) and doubles (8 bytes). Python has a float type that's actually a C double, at least in CPython.

My question is: given an instance of a Python float, is there a "fast" way of checking if the value would lose precision if it was assigned to a protobuf float (or really a C++ float) ?

like image 546
MarkNS Avatar asked Jan 29 '18 16:01

MarkNS


People also ask

What is floating point error in Python?

It's a problem caused when the internal representation of floating-point numbers, which uses a fixed number of binary digits to represent a decimal number. It is difficult to represent some decimal number in binary, so in many cases, it leads to small roundoff errors.

What is the difference between proto2 and Proto3?

Proto3 is the latest version of Protocol Buffers and includes the following changes from proto2: Field presence, also known as hasField , is removed by default for primitive fields. An unset primitive field has a language-defined default value.

Is Proto3 backwards compatible?

One of selling points of Protobuf was backward compatibility, i.e. developers can evolve format, and older clients can still use it. Now with new Protobuf version called proto3, the IDL language itself is not compatible as such things as options , required where dropped, new syntax for enuns, no extention.


3 Answers

You can check convert the float to a hex representation; the sign, exponent and fraction each get a separate section. Provided the fraction uses only the first 6 hex digits (the remaining 7 digits must be zero), and the 6th digit is even (so the last bit is not set) will your 64-bit double float fit in a 32-bit single. The exponent is limited to a value between -126 and 127:

import math
import re

def is_single_precision(
        f,
        _isfinite=math.isfinite,
        _singlepat=re.compile(
            r'-?0x[01]\.[0-9a-f]{5}[02468ace]0{7}p'
            r'(?:\+(?:1[01]\d|12[0-7]|[1-9]\d|\d)|'
            r'-(?:1[01]\d|12[0-6]|[1-9]\d|\d))$').match):
    return not _isfinite(f) or _singlepat(f.hex()) is not None or f == 0.0

The float.hex() method is quite fast, faster than roundtripping via struct or numpy; you can create 1 million hex representations in under half a second:

>>> timeit.Timer('(1.2345678901e+26).hex()').autorange()
(1000000, 0.47934128501219675)

The regex engine is also pretty fast, and with name lookups optimised in the function above we can test 1 million float values in about 1.1 seconds:

>>> import random, sys
>>> testvalues = [0.0, float('inf'), float('-inf'), float('nan')] + [random.uniform(sys.float_info.min, sys.float_info.max) for _ in range(2 * 10 ** 6)]
>>> timeit.Timer('is_single_precision(f())', 'from __main__ import is_single_precision, testvalues; f = iter(testvalues).__next__').autorange()
(1000000, 1.1044921400025487)

The above works because the binary32 format for floats allots 23 bits for the fraction. The exponent is allotted 8 bits (signed). The regex only allows for the first 23 bits to be set, and the exponent to be within the range for a signed 8-bit number.

Also see

  • IEEE 754 single-precision binary floating-point format: binary32
  • IEEE 754 double-precision binary floating-point format: binary64

This may not be what you want however! Take for example 1/3rd or 1/10th. Both are values which require approximation in floating point values, and both fail the test:

>>> (1/3).hex()
'0x1.5555555555555p-2'
>>> (1/10).hex()
'0x1.999999999999ap-4'

You may have to instead take a heuristic approach; if your hex value has all zeros in the first 6 digits of the fraction, or an exponent outside of the (-126, 127) range, converting to double would lead to too much loss.

like image 141
Martijn Pieters Avatar answered Nov 14 '22 22:11

Martijn Pieters


For completeness, here is the "round tripping through struct" method mentioned in the comments, which has the benefit of not requiring numpy but still giving accurate results:

import struct, math
def is_single_precision_struct(x, _s=struct.Struct("f")):
    return math.isnan(x) or _s.unpack(_s.pack(x))[0] == x

Time comparison against is_single_precision_numpy():

  • is_single_precision_numpy(f): [2.5650789737701416, 2.5488431453704834, 2.551704168319702]
  • is_single_precision_struct(f): [0.3972139358520508, 0.39684605598449707, 0.39119601249694824]

So it also seems to be faster on my machine.

like image 34
jpa Avatar answered Nov 14 '22 22:11

jpa


If you want a simple solution that covers almost all corner cases, and will correctly detect out-of-range exponents as well as loss of information from the smaller precision, you can use NumPy to convert your potential float into an np.float32 object, then compare with the original:

import numpy

def is_single_precision_numpy(floatval, _float32=np.float32):
    return _float32(floatval) == floatval

This automatically takes care of potentially problematic cases like values that are in the float32 subnormal range. For example:

>>> is_single_precision_numpy(float.fromhex('0x13p-149'))
True
>>> is_single_precision_numpy(float.fromhex('0x13.8p-149'))
False

Those cases are harder to deal with easily with the hex-based solution.

While not as fast as @Martijn Pieters' regex-based solution, the speed is still respectable (about half as fast as the regex-based solution). Here are timings (where is_single_precision_re_hex is exactly the version from Martijn's answer).

>>> timeit.Timer('is_single_precision_numpy(f)', 'f = 1.2345678901e+26; from __main__ import is_single_precision_numpy').repeat(3, 10**6)
[2.035495020012604, 2.0115931580075994, 2.013475093001034]
>>> timeit.Timer('is_single_precision_re_hex(f)', 'f = 1.2345678901e+26; from __main__ import is_single_precision_re_hex').repeat(3, 10**6)
[1.1169273109990172, 1.1178153319924604, 1.1184561859990936]

Unfortunately, while almost all corner cases (subnormals, infinities, signed zeros, overflows, etc.) are handled correctly, there's one corner case that this solution won't work for: the case that floatval is a NaN. In that case, is_single_precision_numpy will return False. That may or may not matter for your needs. If it does matter, then adding an extra isnan check should do the trick:

import math

def is_single_precision_numpy(floatval, _float32=np.float32, _isnan=math.isnan):
    return _float32(floatval) == floatval or _isnan(floatval)
like image 28
Mark Dickinson Avatar answered Nov 14 '22 23:11

Mark Dickinson