I'm implementing the Pythagorean means in PHP, the arithmetic and geometric means are a piece of cake but I'm having a really hard time coming up with a reliable harmonic mean implementation.
This is the WolframAlpha definition:
And this is the equivalent implementation in PHP:
function harmonicMeanV1()
{
$result = 0;
$arguments = func_get_args();
foreach ($arguments as $argument)
{
$result += 1 / $argument;
}
return func_num_args() / $result;
}
Now, if any of the arguments is 0
this will throw a division by 0 warning, but since 1 / n
is the same as n-1 and pow(0, -1)
gracefully returns the INF
constant without throwing any errors I could rewrite that to the following (it'll still throw errors if there are no arguments, but lets ignore that for now):
function harmonicMeanV2()
{
$arguments = func_get_args();
$arguments = array_map('pow', $arguments, array_fill(0, count($arguments), -1));
return count($arguments) / array_sum($arguments);
}
Both implementations work fine for most cases (example v1, v2 and WolframAlpha), but they fail spectacularly if the sum of the 1 / ni series is 0, I should get another division by 0 warning, but I don't...
Consider the following set: -2, 3, 6
(WolframAlpha says it's a complex infinite):
1 / -2 // -0.5
+ 1 / 3 // 0.33333333333333333333333333333333
+ 1 / 6 // 0.16666666666666666666666666666667
= 0
However, both my implementations return -2.7755575615629E-17
as the sum (v1, v2) instead of 0
.
While the return result on CodePad is -108086391056890000
my dev machine (32-bit) says it's -1.0808639105689E+17
, still it's nothing like the 0
or INF
I was expecting. I even tried calling is_infinite()
on the return value, but it came back as false
as expected.
I also found the stats_harmonic_mean()
function that's part of the stats
PECL extension, but to my suprise I got the exact same buggy result: -1.0808639105689E+17
, if any of the arguments is 0
, 0
is returned but no checks are done to the sum of the series, as you can see on line 3585:
3557 /* {{{ proto float stats_harmonic_mean(array a)
3558 Returns the harmonic mean of an array of values */
3559 PHP_FUNCTION(stats_harmonic_mean)
3560 {
3561 zval *arr;
3562 double sum = 0.0;
3563 zval **entry;
3564 HashPosition pos;
3565 int elements_num;
3566
3567 if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a", &arr) == FAILURE) {
3568 return;
3569 }
3570 if ((elements_num = zend_hash_num_elements(Z_ARRVAL_P(arr))) == 0) {
3571 php_error_docref(NULL TSRMLS_CC, E_WARNING, "The array has zero elements");
3572 RETURN_FALSE;
3573 }
3574
3575 zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(arr), &pos);
3576 while (zend_hash_get_current_data_ex(Z_ARRVAL_P(arr), (void **)&entry, &pos) == SUCCESS) {
3577 convert_to_double_ex(entry);
3578 if (Z_DVAL_PP(entry) == 0) {
3579 RETURN_LONG(0);
3580 }
3581 sum += 1 / Z_DVAL_PP(entry);
3582 zend_hash_move_forward_ex(Z_ARRVAL_P(arr), &pos);
3583 }
3584
3585 RETURN_DOUBLE(elements_num / sum);
3586 }
3587 /* }}} */
This looks like a typical floating precision bug, but I can't really understand the reason why since the individual calculations are quite precise:
Array
(
[0] => -0.5
[1] => 0.33333333333333
[2] => 0.16666666666667
)
Is it possible to work around this problem without reverting to the gmp
/ bcmath
extensions?
You are correct. The numbers you are finding are an artifact of the peculiarities of floating-point arithmetic.
Adding more precision will not help you. All you're doing is moving the goal posts.
The bottom line is that the calculations are done with finite precision. That means that at some point, an intermediate result will be rounded. That intermediate result is then no longer exact. The error propagates through the calculations, and eventually makes it into your final result. When the exact result is zero, you usually get a numeric result of around 1e-16 with double-precision numbers.
This happens every time your calculation involves a fraction with a denominator that is not a power of 2.
The only way around it is to express the calculations in terms of integers or rational numbers (if you can), and use an arbitrary precision integer package to do the calculations. This is what Wolfram|Alpha does.
Note that the calculation of the geometric mean is not trivial, either. Try a sequence of 20 times 1e20. Since the numbers are all the same, the result should be 1e20. But what you'll find is that the result is infinite. The reason is that the product of those 20 numbers (10e400) is outside the range of double-precision floating-point numbers, and so it's set to infinity. The 20th root of infinity is still infinity.
Finally, a meta-observation: the Pythogarian means really only make sense for positive numbers. What is the geometric mean of 3 and -3? Is it imaginary?? The chain of inequalities on the Wikipedia page you link to are only valid if all values are positive.
Yes, this is a problem with floating point precision. -1/2 can be represented exactly, but 1/3 and 1/6 can't. Thus when you add them up, you don't quite get zero.
You can do the use-a-common-denominator approach you mentioned (the H2 and H3 formulas you posted), but that just kicks the can down the road a bit, you'll still get inaccurate results once the sum-of-products term starts rounding off.
Why are you taking the harmonic mean of numbers that might be negative, anyway? That's an inherently unstable calculation (H(-2,3,6+epsilon) varies widely for very small epsilon).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With