Fastest algorithm to identify the smallest and largest x that make the double-precision equation x + a == b true

Tags:

In the context of static analysis, I am interested in determining the values of x in the then-branch of the conditional below:

double x;
x = …;
if (x + a == b)
{
  …

a and b can be assumed to be double-precision constants (generalizing to arbitrary expressions is the easiest part of the problem), and the compiler can be assumed to follow IEEE 754 strictly (FLT_EVAL_METHOD is 0). The rounding mode at run-time can be assumed to be to nearest-even.

If computing with rationals was cheap, it would be simple: the values for x would be the double-precision numbers contained in the rational interval (b - a - 0.5 * ulp1(b) … b - a + 0.5 * ulp2(b)). The bounds should be included if b is even, excluded if b is odd, and ulp1 and ulp2 are two slightly different definitions of “ULP” that can be taken identical if one does not mind losing a little precision on powers of two.

Unfortunately, computing with rationals can be expensive. Consider that another possibility is to obtain each of the bounds by dichotomy, in 64 double-precision additions (each operation deciding one bit of the result). 128 floating-point additions to obtain the lower and upper bounds may well be faster than any solution based on maths.

I am wondering if there is a way to improve over the “128 floating-point additions” idea. Actually I have my own solution involving changes of rounding mode and nextafter calls, but I wouldn't want to cramp anyone's style and cause them to miss a more elegant solution than the one I currently have. Also I am not sure that changing the rounding mode twice is actually cheaper than 64 floating-point additions.

899

asked Jun 14 '14 18:06

Pascal Cuoq

1 Answers

You already gave a nice and elegant solution in your question:

If computing with rationals was cheap, it would be simple: the values for x would be the double-precision numbers contained in the rational interval (b - a - 0.5 * ulp1(b) … b - a + 0.5 * ulp2(b)). The bounds should be included if b is even, excluded if b is odd, and ulp1 and ulp2 are two slightly different definitions of “ULP” that can be taken identical if one does not mind losing a little precision on powers of two.

What follows is a half-reasoned sketch of a partial solution to the problem based on this paragraph. Hopefully I'll get a chance to flesh it out soon. To get a real solution, you'll have to handle subnormals, zeroes, NaNs, and all that other fun stuff. I'm going to assume that a and b are, say, such that 1e-300 < |a| < 1e300 and 1e-300 < |b| < 1e300 so that no craziness occurs at any point.

Absent overflow and underflow, you can get ulp1(b) from b - nextafter(b, -1.0/0.0). You can get ulp2(b) from nextafter(b, 1.0/0.0) - b.

If b/2 <= a <= 2b, then Sterbenz's theorem tells you that b - a is exact. So (b - a) - ulp1 / 2 will be the closest double to the lower bound and (b - a) + ulp2 / 2 will be the closest double to the upper bound. Try these values, and the values immediately before and after, and pick the widest interval that works.

If b > 2a, b - a > b/2. The computed value of b - a is off by at most half an ulp. One ulp1 is at most two ulp, as is one ulp2, so the rational interval you gave is at most two ulp wide. Figure out which of the five closest values to b-a work.

If a > 2b, an ulp of b-a is at least as big as an ulp of b; if anything works, I bet it'll have to be be among the three closest values to b-a. I imagine the case where a and b have different signs works similarly.

I wrote a small pile of C++ code implementing this idea. It didn't fail random fuzz testing (in a few different ranges) before I got bored of waiting. Here it is:

void addeq_range(double a, double b, double &xlo, double &xhi) {
  if (a != a) return; // empty interval
  if (b != b) {
    if (a-a != 0) { xlo = xhi = -a; return; }
    else return; // empty interval
  }
  if (b-b != 0) {
    // TODO: handle me.
  }

  // b is now guaranteed to be finite.
  if (a-a != 0) return; // empty interval

  if (b < 0) {
    addeq_range(-a, -b, xlo, xhi);
    xlo = -xlo;
    xhi = -xhi;
    return;
  }

  // b is now guaranteed to be zero or positive finite and a is finite.
  if (a >= b/2 && a <= 2*b) {
    double upulp = nextafter(b, 1.0/0.0) - b;
    double downulp = b - nextafter(b, -1.0/0.0);
    xlo = (b-a) - downulp/2;
    xhi = (b-a) + upulp/2;
    if (xlo + a == b) {
      xlo = nextafter(xlo, -1.0/0.0);
      if (xlo + a != b) xlo = nextafter(xlo, 1.0/0.0);
    } else xlo = nextafter(xlo, 1.0/0.0);
    if (xhi + a == b) {
      xhi = nextafter(xhi, 1.0/0.0);
      if (xhi + a != b) xhi = nextafter(xhi, -1.0/0.0);
    } else xhi = nextafter(xhi, -1.0/0.0);
  } else {
    double xmid = b-a;
    if (xmid + a < b) {
      xhi = xlo = nextafter(xmid, 1.0/0.0);
      if (xhi + a != b) xhi = xmid;
    } else if (xmid + a == b) {
      xlo = nextafter(xmid, -1.0/0.0);
      xhi = nextafter(xmid, 1.0/0.0);
      if (xlo + a != b) xlo = xmid;
      if (xhi + a != b) xhi = xmid;
    } else {
      xlo = xhi = nextafter(xmid, -1.0/0.0);
      if (xlo + a != b) xlo = xmid;
    }
  }
}

103

answered Oct 05 '22 09:10

tmyklebu

Related questions
                            
                                Why is netcat unable to receive the second broadcast message?
                            
                                How do I turn off '*' in multi-line comments in Eclipse?
                            
                                How to send Ctrl-C control character or terminal hangup message to child process?
                            
                                ncurses transparent console background
                            
                                C++ queue with dependencies
                            
                                Is the data in siginfo trustworthy?
                            
                                Is explicitly clearing/zeroing sensitive variables after use sensible?
                            
                                Compiling functional languages to C
                            
                                Are there any C# to C converter tools? [closed]
                            
                                Naming conventions for Ruby C extension developers
                            
                                Properties of 80-bit extended precision computations starting from double precision arguments
                            
                                Convert a statically linked elf binary to dynamically linked
                            
                                How to free json_object?
                            
                                cpack cannot find libraries, target doesn't exist in this directory
                            
                                int promotion: Is the following well-defined?
                            
                                using getaddrinfo() only checks nscd cache first time if DNS times out
                            
                                Do unnamed bit-fields have well-defined semantics?
                            
                                Since glPushAttrib/glPopAttrib are deprecated, what is the new way to save attributes like GL_DEPTH_FUNC?
                            
                                Is fgets() returning NULL with a short buffer compliant?
                            
                                Pthreads and opaque types [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest algorithm to identify the smallest and largest x that make the double-precision equation x + a == b true

Tags:

c

floating-point

ieee-754

Pascal Cuoq

People also ask

1 Answers

tmyklebu

Recent Activity

Donate For Us