In the context of static analysis, I am interested in determining the values of x
in the then-branch of the conditional below:
double x;
x = …;
if (x + a == b)
{
…
a
and b
can be assumed to be double-precision constants (generalizing to arbitrary expressions is the easiest part of the problem), and the compiler can be assumed to follow IEEE 754 strictly (FLT_EVAL_METHOD
is 0). The rounding mode at run-time can be assumed to be to nearest-even.
If computing with rationals was cheap, it would be simple: the values for x
would be the double-precision numbers contained in the rational interval (b - a - 0.5 * ulp1(b) … b - a + 0.5 * ulp2(b)). The bounds should be included if b
is even, excluded if b
is odd, and ulp1 and ulp2 are two slightly different definitions of “ULP” that can be taken identical if one does not mind losing a little precision on powers of two.
Unfortunately, computing with rationals can be expensive. Consider that another possibility is to obtain each of the bounds by dichotomy, in 64 double-precision additions (each operation deciding one bit of the result). 128 floating-point additions to obtain the lower and upper bounds may well be faster than any solution based on maths.
I am wondering if there is a way to improve over the “128 floating-point additions” idea. Actually I have my own solution involving changes of rounding mode and nextafter
calls, but I wouldn't want to cramp anyone's style and cause them to miss a more elegant solution than the one I currently have. Also I am not sure that changing the rounding mode twice is actually cheaper than 64 floating-point additions.
The smallest representable number in IEEE Double Precision format is approximately 4.95×10−324 which is many orders of magnitude smaller than 1.1102230246316×10−16.
A double-precision floating-point number is a 64-bit approximation of a real number. The number can be zero or can range from -1.7976931348623158e+308 to -2.2250738585072014e-308, or from 2.2250738585072014e-308 to 1.7976931348623158e+308.
How many IEEE double precision numbers are there between an adjacent pair of nonzero IEEE single precision numbers? The binary floating point representation of 2 is 1.02 ×21. Therefore the next larger double precision floating point number is (1 + 2−52) × 21, and the gap is 2−51.
The definition of a mantissa is the part of a number located after a decimal point. An example of mantissa is 234 in the number 1101.234. (mathematics) The part of a common logarithm after the decimal point, the fractional part of a logarithm.
You already gave a nice and elegant solution in your question:
If computing with rationals was cheap, it would be simple: the values for x would be the double-precision numbers contained in the rational interval (b - a - 0.5 * ulp1(b) … b - a + 0.5 * ulp2(b)). The bounds should be included if b is even, excluded if b is odd, and ulp1 and ulp2 are two slightly different definitions of “ULP” that can be taken identical if one does not mind losing a little precision on powers of two.
What follows is a half-reasoned sketch of a partial solution to the problem based on this paragraph. Hopefully I'll get a chance to flesh it out soon. To get a real solution, you'll have to handle subnormals, zeroes, NaNs, and all that other fun stuff. I'm going to assume that a
and b
are, say, such that 1e-300 < |a| < 1e300
and 1e-300 < |b| < 1e300
so that no craziness occurs at any point.
Absent overflow and underflow, you can get ulp1(b)
from b - nextafter(b, -1.0/0.0)
. You can get ulp2(b)
from nextafter(b, 1.0/0.0) - b
.
If b/2 <= a <= 2b
, then Sterbenz's theorem tells you that b - a
is exact. So (b - a) - ulp1 / 2
will be the closest double
to the lower bound and (b - a) + ulp2 / 2
will be the closest double
to the upper bound. Try these values, and the values immediately before and after, and pick the widest interval that works.
If b > 2a
, b - a > b/2
. The computed value of b - a
is off by at most half an ulp. One ulp1
is at most two ulp, as is one ulp2
, so the rational interval you gave is at most two ulp wide. Figure out which of the five closest values to b-a
work.
If a > 2b
, an ulp of b-a
is at least as big as an ulp of b
; if anything works, I bet it'll have to be be among the three closest values to b-a
. I imagine the case where a
and b
have different signs works similarly.
I wrote a small pile of C++ code implementing this idea. It didn't fail random fuzz testing (in a few different ranges) before I got bored of waiting. Here it is:
void addeq_range(double a, double b, double &xlo, double &xhi) {
if (a != a) return; // empty interval
if (b != b) {
if (a-a != 0) { xlo = xhi = -a; return; }
else return; // empty interval
}
if (b-b != 0) {
// TODO: handle me.
}
// b is now guaranteed to be finite.
if (a-a != 0) return; // empty interval
if (b < 0) {
addeq_range(-a, -b, xlo, xhi);
xlo = -xlo;
xhi = -xhi;
return;
}
// b is now guaranteed to be zero or positive finite and a is finite.
if (a >= b/2 && a <= 2*b) {
double upulp = nextafter(b, 1.0/0.0) - b;
double downulp = b - nextafter(b, -1.0/0.0);
xlo = (b-a) - downulp/2;
xhi = (b-a) + upulp/2;
if (xlo + a == b) {
xlo = nextafter(xlo, -1.0/0.0);
if (xlo + a != b) xlo = nextafter(xlo, 1.0/0.0);
} else xlo = nextafter(xlo, 1.0/0.0);
if (xhi + a == b) {
xhi = nextafter(xhi, 1.0/0.0);
if (xhi + a != b) xhi = nextafter(xhi, -1.0/0.0);
} else xhi = nextafter(xhi, -1.0/0.0);
} else {
double xmid = b-a;
if (xmid + a < b) {
xhi = xlo = nextafter(xmid, 1.0/0.0);
if (xhi + a != b) xhi = xmid;
} else if (xmid + a == b) {
xlo = nextafter(xmid, -1.0/0.0);
xhi = nextafter(xmid, 1.0/0.0);
if (xlo + a != b) xlo = xmid;
if (xhi + a != b) xhi = xmid;
} else {
xlo = xhi = nextafter(xmid, -1.0/0.0);
if (xlo + a != b) xlo = xmid;
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With