Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating geometric mean of a long list of random doubles

Tags:

java

math

So, I came across a problem today in my construction of a restricted Boltzmann machine that should be trivial, but seems to be troublingly difficult. Basically I'm initializing 2k values to random doubles between 0 and 1.

What I would like to do is calculate the geometric mean of this data set. The problem I'm running into is that since the data set is so long, multiplying everything together will always result in zero, and doing the proper root at every step will just rail to 1.

I could potentially chunk the list up, but I think that's really gross. Any ideas on how to do this in an elegant way?

In theory I would like to extend my current RBM code to have closer to 15k+ entries, and be able to run the RBM across multiple threads. Sadly this rules out apache commons math (geometric mean method is not synchronized), longs.

like image 933
Slater Victoroff Avatar asked Apr 23 '13 08:04

Slater Victoroff


2 Answers

Wow, using a big decimal type is way overkill!

Just take the logarithm of everything, find the arithmetic mean, and then exponentiate.

like image 103
user541686 Avatar answered Nov 10 '22 15:11

user541686


Mehrdad's logarithm solution certainly works. You can do it faster (and possibly more accurately), though:

  1. Compute the sum of the exponents of the numbers, say S.
  2. Slam all of the exponents to zero so that each number is between 1/2 and 1.
  3. Group the numbers into bunches of at most 1000.
    • For each group, compute the product of the numbers. This will not underflow.
    • Add the exponent of the product to S and slam the exponent to zero.
  4. You now have about 1/1000 as many numbers. Repeat steps 2 and 3 unless you only have one number.
  5. Call the one remaining number T. The geometric mean is T1/N 2S/N, where N is the size of the input.
like image 35
tmyklebu Avatar answered Nov 10 '22 13:11

tmyklebu