I am trying to compute this posterior distribution in R. The problem is that the numerator, which is the product of a bunch of dbern(p_i, y_i) < 1, is too small. (My n is about 1500). Hence, R spits out 0, and the posterior values for all \theta are also 0.
To clarify, each y_i has its own p_i, together these p_i's make a vector of n elements for n y's. Each theta has its own n-element vector of p_i.
A reproducible example (of the numerator)
p <- sample(seq(0.001,0.999,by=0.01), 1500, replace=T)
y <- sample(c(0,1), 1500, replace=T)
dbern(y, p) # 1500-element vector, each element < 1
prod(dbern(y, p)) # produces 0
exp(sum(log(dbern(y, p)))) # produces 0
EDIT (context): I am doing a Bayesian change point analysis (jstor.org/stable/25791783 - Western and Kleykamp 2004). Unlike the continuous y in the paper, my y is binary, so I'm using the data augmentation method in Albert and Chib (1993). With that method, the likelihood of y is Bernoulli, with p = cdf-normal(x'B).
So how does p depends on theta? It's because theta is the change point. One of the x's is a time dummy -- if theta=10, for example, then the time dummy = 1 for all observations after day 10, and = 0 for all observations before day 10.
Thus, p depends on x, x depends on theta -- thus, p depends on theta.
I need the above quantity because is the full conditional of theta in Gibbs sampling.
One way to tackle precision problems like this is to work in log space. However that introduces a log-sum-product from the denominator, which generally may be painful.
If you're calculating the posterior for purposes of optimization, be aware that you may be able to drop the denominator completely: you don't need to normalize to find an argmax
.
Well I ran your example, you are (as expected) getting 0
because there is a 0
p <- sample(seq(0,1,by=0.01), 1500, replace=T)
y <- sample(c(0,1), replace=T)
x <- dbern(y, p)
any(x == 0)
## [1] TRUE
I asked this question on Cross Validated also, and glen_b gave me this (tested) solution below:
This is a common problem with computation of likelihoods for all manner of models; the kinds of things that are commonly done are to work on logs, and to use a common scaling factor that bring the values into a more reasonable range.
In this case, I'd suggest:
Step 1: Pick a fairly "typical" θ, θ0. Divide the formula for both numerator and denominator of the general term by the numerator for θ=θ0, in order to get something that will be much less likely to underflow.
Step 2: work on the log scale, this means that the numerator is an exp of sums of differences of logs, and the denominator is a sum of exp of sums of differences of logs.
NB: If any of your p's are 0 or 1, pull those out separately and don't take logs of those terms; they're easy to evaluate as is!
The usual terms in the numerator will tend to be more moderate in size, and so in many situations the numerator and denominator are both relatively reasonable.
If there are a range of sizes in the denominator, add up the smaller ones before adding the larger ones.
If one or a few terms dominate heavily, you should focus your attention on making the computation for those relatively accurate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With