I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be
double geometric_mean(std::vector<double> const&data) // failure { auto product = 1.0; for(auto x:data) product *= x; return std::pow(product,1.0/data.size()); }
However, this may well fail because of underflow or overflow in the accumulated product
(note: long double
doesn't really avoid this problem). So, the next option is to sum-up the logarithms:
double geometric_mean(std::vector<double> const&data) { auto sumlog = 0.0; for(auto x:data) sum_log += std::log(x); return std::exp(sum_log/data.size()); }
This works, but calls std::log()
for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product
separately?
Basically, we multiply the numbers altogether and take the nth root of the multiplied numbers, where n is the total number of data values. For example: for a given set of two numbers such as 3 and 1, the geometric mean is equal to √(3×1) = √3 = 1.732.
Basically, we multiply the 'n' values altogether and take out the nth root of the numbers, where n is the total number of values. For example: for a given set of two numbers such as 8 and 1, the geometric mean is equal to √(8×1) = √8 = 2√2.
Formula for Geometric Mean For GM formula, multiply all the “n” numbers together and take the “nth root of them.
The "split exponent and mantissa" solution:
double geometric_mean(std::vector<double> const & data) { double m = 1.0; long long ex = 0; double invN = 1.0 / data.size(); for (double x : data) { int i; double f1 = std::frexp(x,&i); m*=f1; ex+=i; } return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN); }
If you are concerned that ex
might overflow you can define it as a double instead of a long long
, and multiply by invN
at every step, but you might lose a lot of precision with this approach.
EDIT For large inputs, we can split the computation in several buckets:
double geometric_mean(std::vector<double> const & data) { long long ex = 0; auto do_bucket = [&data,&ex](int first,int last) -> double { double ans = 1.0; for ( ;first != last;++first) { int i; ans *= std::frexp(data[first],&i); ex+=i; } return ans; }; const int bucket_size = -std::log2( std::numeric_limits<double>::min() ); std::size_t buckets = data.size() / bucket_size; double invN = 1.0 / data.size(); double m = 1.0; for (std::size_t i = 0;i < buckets;++i) m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN ); m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN ); return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With