Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to compute geometric mean of many numbers

I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be

double geometric_mean(std::vector<double> const&data) // failure {   auto product = 1.0;   for(auto x:data) product *= x;   return std::pow(product,1.0/data.size()); } 

However, this may well fail because of underflow or overflow in the accumulated product (note: long double doesn't really avoid this problem). So, the next option is to sum-up the logarithms:

double geometric_mean(std::vector<double> const&data) {   auto sumlog = 0.0;   for(auto x:data) sum_log += std::log(x);   return std::exp(sum_log/data.size()); } 

This works, but calls std::log() for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product separately?

like image 696
Walter Avatar asked Nov 14 '13 14:11

Walter


People also ask

How do you find the geometric mean of three values?

Basically, we multiply the numbers altogether and take the nth root of the multiplied numbers, where n is the total number of data values. For example: for a given set of two numbers such as 3 and 1, the geometric mean is equal to √(3×1) = √3 = 1.732.

What is the easiest way to calculate geometric mean?

Basically, we multiply the 'n' values altogether and take out the nth root of the numbers, where n is the total number of values. For example: for a given set of two numbers such as 8 and 1, the geometric mean is equal to √(8×1) = √8 = 2√2.

How do you find the geometric mean of 6 numbers?

Formula for Geometric Mean For GM formula, multiply all the “n” numbers together and take the “nth root of them.


1 Answers

The "split exponent and mantissa" solution:

double geometric_mean(std::vector<double> const & data) {     double m = 1.0;     long long ex = 0;     double invN = 1.0 / data.size();      for (double x : data)     {         int i;         double f1 = std::frexp(x,&i);         m*=f1;         ex+=i;     }      return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN); } 

If you are concerned that ex might overflow you can define it as a double instead of a long long, and multiply by invN at every step, but you might lose a lot of precision with this approach.

EDIT For large inputs, we can split the computation in several buckets:

double geometric_mean(std::vector<double> const & data) {     long long ex = 0;     auto do_bucket = [&data,&ex](int first,int last) -> double     {         double ans = 1.0;         for ( ;first != last;++first)         {             int i;             ans *= std::frexp(data[first],&i);             ex+=i;         }         return ans;     };      const int bucket_size = -std::log2( std::numeric_limits<double>::min() );     std::size_t buckets = data.size() / bucket_size;      double invN = 1.0 / data.size();     double m = 1.0;      for (std::size_t i = 0;i < buckets;++i)         m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN );      m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN );      return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m; } 
like image 119
sbabbi Avatar answered Sep 24 '22 13:09

sbabbi