Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping "test for zero" checks in IEEE 754

Tags:

c

ieee-754

I read this in a popular book about computer graphics,

There are many numeric computations that become much simpler if the programmer takes advantage of the IEEE rules. For example, consider the expression:

a = 1 / (1/b + 1/c)

Such expressions arise with resistors and lenses. If divide-by-zero resulted in a program crash (as was true in many systems before IEEE floating-point), then two if statements would be required to check for small or zero values of b or c. Instead, with IEEE floating-point, if b or c is zero, we will get a zero value for a as desired.

But what about the case where b=+0 and c=-0? Then a=1/inf-inf=nan. Is the book wrong about this optimizations, or have I misunderstood something? It seems like we'll still need at least one check for the signs of b & c rather than no checks at all.

Edit One suggestion in the comments was just to do a post-check for NaN. Is that the idiomatic way to "take advantage" of these number type?

bool is_nan(float x) { return x != x; }

float optic_thing(float b, float c) {
  float result = 1.0f / (1.0f/b + 1.0f/c);
  if (is_nan(result)) result = 0;
  return result;
}
like image 206
dune.rocks Avatar asked Oct 30 '22 21:10

dune.rocks


1 Answers

But what about the case where b=+0 and c=-0?

Convert -0.0 to +0.0 without branching by adding 0.0.

int main(void) {
  double b = +0.0;
  double c = -0.0;
  printf("%e\n", b);
  printf("%e\n", c);
  printf("%e\n", 1.0/(1.0/b + 1.0/c));
  b += 0.0;
  c += 0.0;
  printf("%e\n", 1.0/(1.0/b + 1.0/c));
  return 0;
}

Output

0.000000e+00
-0.000000e+00
nan
0.000000e+00

[Edit] On the other hand, any values near 0.0, but not 0.0, are likely numeric artifacts and not accurate resistance values. The above still has trouble with tiny value pairs like b = DBL_TRUE_MIN; c = -b; The issue is 1.0/tiny --> Infinity and +Infinity + -Infinity --> NAN. Could resort to either using wider floating point for the quotient calculation or narrow the operands.

  b += 0.0;
  c += 0.0;
  printf("%Le\n", 1.0/(1.0L/b + 1.0L/c));

  // Lose some precision, but we are talking about real resistors/lenses here.
  b = (float) b + 0.0f;
  c = (float) c + 0.0f;
  printf("%e\n", 1.0/(1.0/b + 1.0/c));
like image 177
chux - Reinstate Monica Avatar answered Nov 15 '22 05:11

chux - Reinstate Monica