Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Harmonic Mean Calculation and Float Precision

I'm implementing the Pythagorean means in PHP, the arithmetic and geometric means are a piece of cake but I'm having a really hard time coming up with a reliable harmonic mean implementation.

This is the WolframAlpha definition:

Harmonic Mean Definition from WolframAlpha


And this is the equivalent implementation in PHP:

function harmonicMeanV1()
{
    $result = 0;
    $arguments = func_get_args();

    foreach ($arguments as $argument)
    {
        $result += 1 / $argument;
    }

    return func_num_args() / $result;
}

Now, if any of the arguments is 0 this will throw a division by 0 warning, but since 1 / n is the same as n-1 and pow(0, -1) gracefully returns the INF constant without throwing any errors I could rewrite that to the following (it'll still throw errors if there are no arguments, but lets ignore that for now):

function harmonicMeanV2()
{
    $arguments = func_get_args();
    $arguments = array_map('pow', $arguments, array_fill(0, count($arguments), -1));

    return count($arguments) / array_sum($arguments);
}

Both implementations work fine for most cases (example v1, v2 and WolframAlpha), but they fail spectacularly if the sum of the 1 / ni series is 0, I should get another division by 0 warning, but I don't...

Consider the following set: -2, 3, 6 (WolframAlpha says it's a complex infinite):

  1 / -2    // -0.5
+ 1 / 3     // 0.33333333333333333333333333333333
+ 1 / 6     // 0.16666666666666666666666666666667

= 0

However, both my implementations return -2.7755575615629E-17 as the sum (v1, v2) instead of 0.

While the return result on CodePad is -108086391056890000 my dev machine (32-bit) says it's -1.0808639105689E+17, still it's nothing like the 0 or INF I was expecting. I even tried calling is_infinite() on the return value, but it came back as false as expected.

I also found the stats_harmonic_mean() function that's part of the stats PECL extension, but to my suprise I got the exact same buggy result: -1.0808639105689E+17, if any of the arguments is 0, 0 is returned but no checks are done to the sum of the series, as you can see on line 3585:

3557    /* {{{ proto float stats_harmonic_mean(array a)
3558       Returns the harmonic mean of an array of values */
3559    PHP_FUNCTION(stats_harmonic_mean)
3560    {
3561        zval *arr;
3562        double sum = 0.0;
3563        zval **entry;
3564        HashPosition pos;
3565        int elements_num;
3566    
3567        if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a",  &arr) == FAILURE) {
3568            return;
3569        }
3570        if ((elements_num = zend_hash_num_elements(Z_ARRVAL_P(arr))) == 0) {
3571            php_error_docref(NULL TSRMLS_CC, E_WARNING, "The array has zero elements");
3572            RETURN_FALSE;
3573        }
3574    
3575        zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(arr), &pos);
3576        while (zend_hash_get_current_data_ex(Z_ARRVAL_P(arr), (void **)&entry, &pos) == SUCCESS) {
3577            convert_to_double_ex(entry);
3578            if (Z_DVAL_PP(entry) == 0) {
3579                RETURN_LONG(0);
3580            }
3581            sum += 1 / Z_DVAL_PP(entry);
3582            zend_hash_move_forward_ex(Z_ARRVAL_P(arr), &pos);   
3583        }
3584    
3585        RETURN_DOUBLE(elements_num / sum);
3586    }
3587    /* }}} */

This looks like a typical floating precision bug, but I can't really understand the reason why since the individual calculations are quite precise:

Array
(
    [0] => -0.5
    [1] => 0.33333333333333
    [2] => 0.16666666666667
)

Is it possible to work around this problem without reverting to the gmp / bcmath extensions?

like image 328
Alix Axel Avatar asked May 04 '12 03:05

Alix Axel


Video Answer


2 Answers

You are correct. The numbers you are finding are an artifact of the peculiarities of floating-point arithmetic.

Adding more precision will not help you. All you're doing is moving the goal posts.

The bottom line is that the calculations are done with finite precision. That means that at some point, an intermediate result will be rounded. That intermediate result is then no longer exact. The error propagates through the calculations, and eventually makes it into your final result. When the exact result is zero, you usually get a numeric result of around 1e-16 with double-precision numbers.

This happens every time your calculation involves a fraction with a denominator that is not a power of 2.

The only way around it is to express the calculations in terms of integers or rational numbers (if you can), and use an arbitrary precision integer package to do the calculations. This is what Wolfram|Alpha does.

Note that the calculation of the geometric mean is not trivial, either. Try a sequence of 20 times 1e20. Since the numbers are all the same, the result should be 1e20. But what you'll find is that the result is infinite. The reason is that the product of those 20 numbers (10e400) is outside the range of double-precision floating-point numbers, and so it's set to infinity. The 20th root of infinity is still infinity.

Finally, a meta-observation: the Pythogarian means really only make sense for positive numbers. What is the geometric mean of 3 and -3? Is it imaginary?? The chain of inequalities on the Wikipedia page you link to are only valid if all values are positive.

like image 57
Jeffrey Sax Avatar answered Oct 03 '22 00:10

Jeffrey Sax


Yes, this is a problem with floating point precision. -1/2 can be represented exactly, but 1/3 and 1/6 can't. Thus when you add them up, you don't quite get zero.

You can do the use-a-common-denominator approach you mentioned (the H2 and H3 formulas you posted), but that just kicks the can down the road a bit, you'll still get inaccurate results once the sum-of-products term starts rounding off.

Why are you taking the harmonic mean of numbers that might be negative, anyway? That's an inherently unstable calculation (H(-2,3,6+epsilon) varies widely for very small epsilon).

like image 25
Keith Randall Avatar answered Oct 03 '22 00:10

Keith Randall