Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with the log of a sum of more than two super small probabilities

For some evil reason I need to calculate the log of the sum of 500 super small probabilities, each term computed by

dmvnorm(X[,i], mean=rep(0,3), sigma=diag(3))

Sometimes the codes above return 0 due to underflow, but using logarithms will be fine.

dmvnorm(X[,i], mean=rep(0,3), sigma=diag(3), log=TRUE)

I know I can handle two terms mathematically: log(p1 + p2) = log(p2) + log(1 + p1/p2). But Can we generalize this approach to more terms? Anyone with more experience on this?


BTW, I wrote a recursive function to compute this. Mathematically, it works. But I don't think this is practical.

MESSY <- function (pv) 
{
  N <- length(pv)
  if (N==1)
    {return(pv[1])}
  else
    {w <- pv[N]
     pv <- pv[1:N-1]-w
     return(w + log(1 + exp(MESSY(pv))))
    }
}

Because some p are super small and I can only use w=log(p), we have log(exp(w1)+exp(w2)) = w2 + log(1+exp(w1-w2)) and log(exp(w1)+exp(w2)+exp(w3)) = w3 + log(1 + exp(w1-w3) + exp(w2-w3)) for two and three terms.

like image 586
Paw in Data Avatar asked Sep 11 '25 10:09

Paw in Data


1 Answers

This function is translated from the internal logspace_add function in the R source code here

logspace_add <- function(logx,logy) { 
    pmax(logx,logy) + log1p(exp(-abs(logx - logy))) 
}

Not necessarily the most efficient, but you should be able to do this for >2 elements by using Reduce():

logspace_add_mult <- function(x) {
    Reduce(logspace_add, x)
}

Quick test (using values that are large enough not to underflow, so that we can compare the results of the regular and logspace calculations).

x <- c(1e-4,1e-5,1e-6)
logspace_add_mult(log(x))
## [1] -9.10598
log(sum(x))
## [1] -9.10598

As far as I know, this is more or less the standard approach to logspace addition. The advantage of using something other than this home-grown implementation would be (1) code maturity and testing and (2) speed (at least for the logspace_add_mult version; I doubt there would be much advantage of a native-C (or whatever) implementation of logspace_add). The Brobdingnag package uses similar representations:

library(Brobdingnag)
brob(log(x))
## [1] +exp(-9.2103) +exp(-11.513) +exp(-13.816)
sum(brob(log(x)))
## [1] +exp(-9.106)
log(as.numeric(sum(brob(log(x)))))
## [1] -9.10598

In Python, numpy has logaddexp, but this only works pairwise: you can use functools.reduce() to generalize it as above.

import numpy as np
import functools as f
x = np.array([1e-4,1e-5,1e-6])
f.reduce(np.logaddexp,np.log(x))

This is probably a little faster than Reduce() in R.

like image 176
Ben Bolker Avatar answered Sep 14 '25 02:09

Ben Bolker