Efficient way to compute geometric mean of many numbers

Tags:

I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be

double geometric_mean(std::vector<double> const&data) // failure {   auto product = 1.0;   for(auto x:data) product *= x;   return std::pow(product,1.0/data.size()); }

However, this may well fail because of underflow or overflow in the accumulated product (note: long double doesn't really avoid this problem). So, the next option is to sum-up the logarithms:

double geometric_mean(std::vector<double> const&data) {   auto sumlog = 0.0;   for(auto x:data) sum_log += std::log(x);   return std::exp(sum_log/data.size()); }

This works, but calls std::log() for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product separately?

696

asked Nov 14 '13 14:11

Walter

1 Answers

The "split exponent and mantissa" solution:

double geometric_mean(std::vector<double> const & data) {     double m = 1.0;     long long ex = 0;     double invN = 1.0 / data.size();      for (double x : data)     {         int i;         double f1 = std::frexp(x,&i);         m*=f1;         ex+=i;     }      return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN); }

~~If you are concerned that ex might overflow you can define it as a double instead of a long long, and multiply by invN at every step, but you might lose a lot of precision with this approach.~~

EDIT For large inputs, we can split the computation in several buckets:

double geometric_mean(std::vector<double> const & data) {     long long ex = 0;     auto do_bucket = [&data,&ex](int first,int last) -> double     {         double ans = 1.0;         for ( ;first != last;++first)         {             int i;             ans *= std::frexp(data[first],&i);             ex+=i;         }         return ans;     };      const int bucket_size = -std::log2( std::numeric_limits<double>::min() );     std::size_t buckets = data.size() / bucket_size;      double invN = 1.0 / data.size();     double m = 1.0;      for (std::size_t i = 0;i < buckets;++i)         m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN );      m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN );      return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m; }

119

answered Sep 24 '22 13:09

sbabbi

Related questions
                            
                                Odd bit operator in the increment statement of a for loop [duplicate]
                            
                                Is a Union Member's Destructor Called
                            
                                Could you recommend some guides about Epoll on Linux [closed]
                            
                                C++ SQL database library comparison [closed]
                            
                                What's the performance penalty of weak_ptr?
                            
                                Performance degradation due to default initialisation of elements in standard containers
                            
                                What is the function of const specifier in enum types?
                            
                                C++ object size with virtual methods
                            
                                Async wait on file descriptor using Boost Asio
                            
                                For nested templates, when did `>>` become standard C++ (instead of `> >`)?
                            
                                C++11 Smart Pointer Policies
                            
                                How to implement atoi using SIMD?
                            
                                Forward declaration as struct vs class
                            
                                Mapping Stream data to data structures in C#
                            
                                running time of two programs run separately and then together
                            
                                Why not enforce 2's complement in C++?
                            
                                GCC: sorry, unimplemented: 64-bit mode not compiled in
                            
                                Complexity of std::list::splice and other list containers
                            
                                Inferring the call signature of a lambda or arbitrary callable for "make_function"
                            
                                Are there any guarantees on the representation of large enum values?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient way to compute geometric mean of many numbers

Tags:

c++

c

algorithm

numerical

underflow

Walter

People also ask

1 Answers

sbabbi

Recent Activity

Donate For Us