Taking logs and adding versus multiplying

Tags:

If I want to take the product of a list of floating point numbers, what's the worst-case/average-case precision lost by adding their logs and then taking exp of the sum as opposed to just multiplying them. Is there ever a case when this is actually more precise?

613

asked Apr 07 '13 03:04

dspyz

1 Answers

Absent any overflow or underflow shenanigans, if a and b are floating-point numbers, then the product a*b will be computed to within a relative error of 1/2 ulp.

A crude bound on the relative error after multiplying a chain of N doubles therefore results in answer off by a factor of at most (1 - epsilon/2)^-N, which is about exp(epsilon N/2). I'd imagine you can expect a deviation of around epsilon sqrt(N) in the average case. (To first order, this is about N epsilon.)

Exponent overflow and underflow are more likely to happen with this strategy, though; you're more likely to get infinities, zeroes, and NaNs as well as imprecise values due to rounding of subnormals.

The other approach is more robust in that sense, but it is much slower and errs worse in the case where the straightforward approach doesn't result in an overflow or underflow. Here's a very, very crude analysis for standard doubles in the case where N is at least a couple orders of magnitude smaller than 2⁵³:

You can always take the log of a finite floating-point number and get a finite floating-point number, so we're cool there. You can add up N floating-point numbers either straightforwardly to get N epsilon worst-case "relative" error and sqrt(N) epsilon expected "relative" error, or using Kahan summation to get about 3 epsilon worst-case "relative" error. Scare quotes are around "relative" because the error is relative to the sum of the absolute values of the things you're summing.

Notice that no finite double has a logarithm whose absolute value is bigger than 710 or so. That means our sum-of-logarithms computed using Kahan summation has an absolute error of at most 2130 N epsilon. When we exponentiate our sum-of-logarithms, we get something off by a factor of at most exp(2130 N epsilon) from the right answer.

A pathological examples for the log-sum-exp approach:

int main() {
  double foo[] = {0x1.000000000018cp1023, 0x1.0000000000072p-1023};
  double prod = 1;
  double sumlogs = 0;
  for (int i = 0; i < sizeof(foo) / sizeof(*foo); i++) {
    prod *= foo[i];
    sumlogs += log(foo[i]);
  }
  printf("%a %a\n", foo[0], foo[1]);
  printf("%a %a %a\n", prod, exp(sumlogs), prod - exp(sumlogs));
}

On my platform, I get a difference of 0x1.fep-44. I'm sure there are worse examples.

113

answered Sep 28 '22 00:09

tmyklebu

Related questions
                            
                                Difference between double comparisons in gtest (C++) and nunit (C#)
                            
                                Input of a double precision number
                            
                                Android RatingBar doesn't accept float values
                            
                                Compile-time (constexpr) float modulo?
                            
                                Floating point arithmetic and machine epsilon
                            
                                Difference Between HUGE_VALF and INFINITY Constants
                            
                                Rounding to n significant digits
                            
                                Float vs double Math Java
                            
                                Python format default rounding when formatting float number
                            
                                Difference in gcc -ffp-contract options
                            
                                How to print full precision of floating numbers [Python]
                            
                                Why does Octave round to 1 'earlier' than 0?
                            
                                How does Double.toString() work if a fraction number cannot be precisely represented in binary?
                            
                                Floating point calculation change depending on the compiler
                            
                                Iterate through all possible floating-point values
                            
                                is memset(ary,0,length) a portable way of inputting zero in double array [duplicate]
                            
                                Rounding to 2 decimal places
                            
                                "For" loops increment not working correctly. Why?
                            
                                How to calculate decimal digits of precision based on the number of bits?
                            
                                What does this float formula mean?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Taking logs and adding versus multiplying

Tags:

floating-point

precision

multiplication

logarithm

exp

dspyz

People also ask

1 Answers

tmyklebu

Recent Activity

Donate For Us