A float
(a.k.a. single) value is a 4-byte value, and supposed to represent any real-valued number. Because of the way it is formatted and the finite number of bytes it is made off, there is a minimum value and a maximum value it can represent, and it has a finite precision, depending on it's own value.
I would like to know if there is a way to get the closest possible value above or below some reference value, given the finite precision of a float. With integers, this is trivial: one simply adds or subtracts 1. But with a float
, you can't simply add or subtract the minimum float value and expect it to be different from your original value. I.e.
float FindNearestSmaller (const float a)
{
return a - FLT_MIN; /* This doesn't necessarily work */
}
In fact, the above will almost never work. In the above case, the return will generally still equal a
, as the FLT_MIN
is far beyond the precision of a
. You can easily try this out for yourself: it works for e.g. 0.0f
, or for very small numbers of order FLT_MIN
, but not for anything between 0 and 100.
So how would you get the value that is closest but smaller or larger than a
, given floating point precision?
Note: Though i am mainly interested in a C/C++ answer, I assume the answer will be applicable for most programming languages.
The standard way to find a floating-point value's neighbors is the function nextafter
for double
and nextafterf
for float
. The second argument gives the direction. Remember that infinities are legal values in IEEE 754 floating-point, so you can very well call nextafter(x, +1.0/0.0)
to get the value immediately above x
, and this will work even for DBL_MAX
(whereas if you wrote nextafter(x, DBL_MAX)
, it would return DBL_MAX
when applied for x == DBL_MAX
).
Two non-standard ways that are sometimes useful are:
access the representation of the float
/double
as an unsigned integer of the same size, and increment or decrement this integer. The floating-point format was carefully designed so that for positive floats, and respectively for negative floats, the bits of the representation, seen as an integer, evolve monotonously with the represented float.
change the rounding mode to upward, and add the smallest positive floating-point number. The smallest positive floating-point number is also the smallest increment that there can be between two floats, so this will never skip any float. The smallest positive floating-point number is FLT_MIN * FLT_EPSILON
.
For the sake of completeness, I will add that even without changing the rounding mode from its “to nearest” default, multiplying a float by (1.0f + FLT_EPSILON)
produces a number that is either the immediate neighbor away from zero, or the neighbor after that. It is probably the cheapest if you already know the sign of the float you wish to increase/decrease and you don't mind that it sometimes does not produce the immediate neighbor. Functions nextafter
and nextafterf
are specified in such a way that a correct implementation on the x86 must test for a number of special values and FPU states, and is thus rather costly for what it does.
To go towards zero, multiply by 1.0f - FLT_EPSILON
.
This doesn't work for 0.0f
, obviously, and generally for the smaller denormalized numbers.
The values for which multiplying by 1.0f + FLT_EPSILON
advance by 2 ULPS are just below a power of two, specifically in the interval [0.75 * 2p … 2p). If you don't mind doing a multiplication and an addition, x + (x * (FLT_EPSILON * 0.74))
should work for all normal numbers (but still not for zero nor for all the small denormal numbers).
Look at the "nextafter" function, which is part of Standard C (and probably C++, but I didn't check).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With