Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

floating point product expansion equivalence

In IEEE 754 floating point, it is possible that

a*(b-c) != a*b-a*c // a, b, c double

So expansion of a product is not guaranteed to be equal to the unexpanded product.

But what about this:

a*(b1+b2+...+bn) == a*b1+a*b2+...+a*bn // b1==b2==...==bn

When all b equal, is equivalence guaranteed (in case of no under-/overflow)? Is there a difference if equality of b is known at compile time or not?

Edit:

It is not - see Eric Postpischil and Pascal Cuoq.

But maybe holds the weaker assertion?:

   (1.0/n)*(b1+b2+...+bn) <= 1.0
&& (1.0/n)*b1+(1.0/n)*b2+...+(1.0/n)*bn <= 1.0

// when all b<=1.0 and n integral double but not power of 2
// so that 1.0/n not exactly representable with base-2 floating point

I simply wonder if you can guarantee that the average of a data set does not exceed some value that is also not exceeded by every single data value, no matter how you compute the average (first adding and once dividing, or adding every value divided).

Edit2:

Ok, the && doesn't hold. See Eric Postpischil and David Hammen:

average of nine 1.0 -> only first condition is true, second exceeds.
average of ten 1.0/3 -> only second condition is true, first exceeds.

Is then the optimal method of computation of an average dependent of upper expected limit of data set? Or also of size (that means n) of data set? Or is there no optimal method surely existing?

like image 664
mb84 Avatar asked Feb 10 '14 12:02

mb84


People also ask

What is a floating-point imprecision?

What is floating point imprecision? Floating point imprecision stems from the problem of trying to store numbers like 1/10 or (. 10) in a computer with a binary number system with a finite amount of numbers. Why does the computer have trouble storing the number .

Why do floats have rounding errors?

Because floating-point numbers have a limited number of digits, they cannot represent all real numbers accurately: when there are more digits than the format allows, the leftover ones are omitted - the number is rounded.


3 Answers

Even excluding overflow and underflow, the results may differ.

.3 * (.3+.3+.3) is 0.269999999999999962252417162744677625596523284912109375 while .3*.3 + .3*.3 + .3*.3 is 0.270000000000000017763568394002504646778106689453125 (when both are evaluated with IEEE-754 rules and 64-bit binary floating-point).

Regarding the questions added in an update:

There are two questions, one of which asks whether the computed average of a set of numbers each not exceeding one can exceed one. As David Hammon points out, calculating the average of nine 1s as 1./9*1 + 1./9*1 + 1./9*1 + 1./9*1 + 1./9*1 + 1./9*1 + 1./9*1 + 1./9*1 + 1./9*1 produces 1.0000000000000002220446049250313080847263336181640625.

The other asks whether the computed average of a set can exceed all of the numbers in the set. This is a superset of the first question, but here is an example using a different calculation of the average:

Let b = 1./3 (which is 0.333333333333333314829616256247390992939472198486328125).

Then 1./10 * (b+b+b+b+b+b+b+b+b+b) is 0.33333333333333337034076748750521801412105560302734375, which is greater than b.

like image 152
Eric Postpischil Avatar answered Oct 23 '22 10:10

Eric Postpischil


IEEE 754 doesn't delve too much into language details. In particular, details like "compile time" aren't specified. That doesn't even make sense for some languages.

The easiest case to understand is when you have an intermediate infinity. Assume that sum(b)==INF, barely, but a is 0.5. The result a*sum(b) is still INF. However, by multiplying first, the subsequent addition no longer overflows.

like image 27
MSalters Avatar answered Oct 23 '22 12:10

MSalters


But maybe holds the weaker assertion?:

(1.0/n)*(b1+b2+...+bn) <= 1.0
&& (1.0/n)*b1+(1.0/n)*b2+...+(1.0/n)*bn <= 1.0

No. For example, this assertion fails on my computer with n=9 and bi=1.0.

I simply wonder if you can guarantee that the average of a data set does not exceed some value that is also not exceeded by every single data value, no matter how you compute the average (first adding and once dividing, or adding every value divided).

Once again, the answer is no. The correlation E[(X-Xbar)*(Y-Ybar)]/(sigma_x * sigma_y) between two random variables should always be between -1.0 and 1.0. Yet if you compute the statistics for two perfectly correlated (or perfectly anti-correlated) random variables, you'll oftentimes see a correlation that is slightly greater than +1 (or slightly less than -1).

like image 2
David Hammen Avatar answered Oct 23 '22 11:10

David Hammen