I'm looking for some script/package in R (Python will do too) to find out the component distribution parameters from a mixture of Gaussian and Gamma distributions. I've so far used the R package "mixtools" to model the data as mixture of Gaussians, but I think it can be better modeled by Gamma plus Gaussian.
Thanks
Here's one possibility:
Define utility functions:
rnormgammamix <- function(n,shape,rate,mean,sd,prob) {
ifelse(runif(n)<prob,
rgamma(n,shape,rate),
rnorm(n,mean,sd))
}
(This could be made a little bit more efficient ...)
dnormgammamix <- function(x,shape,rate,mean,sd,prob,log=FALSE) {
r <- prob*dgamma(x,shape,rate)+(1-prob)*dnorm(x,mean,sd)
if (log) log(r) else r
}
Generate fake data:
set.seed(101)
r <- rnormgammamix(1000,1.5,2,3,2,0.5)
d <- data.frame(r)
Approach #1: bbmle
package. Fit shape, rate, standard deviation on log scale, prob on logit scale.
library("bbmle")
m1 <- mle2(r~dnormgammamix(exp(logshape),exp(lograte),mean,exp(logsd),
plogis(logitprob)),
data=d,
start=list(logshape=0,lograte=0,mean=0,logsd=0,logitprob=0))
cc <- coef(m1)
png("normgam.png")
par(bty="l",las=1)
hist(r,breaks=100,col="gray",freq=FALSE)
rvec <- seq(-2,8,length=101)
pred <- with(as.list(cc),
dnormgammamix(rvec,exp(logshape),exp(lograte),mean,
exp(logsd),plogis(logitprob)))
lines(rvec,pred,col=2,lwd=2)
true <- dnormgammamix(rvec,1.5,2,3,2,0.5)
lines(rvec,true,col=4,lwd=2)
dev.off()
tcc <- with(as.list(cc),
c(shape=exp(logshape),
rate=exp(lograte),
mean=mean,
sd=exp(logsd),
prob=plogis(logitprob)))
cbind(tcc,c(1.5,2,3,2,0.5))
The fit is reasonable, but the parameters are fairly far off -- I think this model isn't very strongly identifiable in this parameter regime (i.e., the Gamma and gaussian components can be swapped)
library("MASS")
ff <- fitdistr(r,dnormgammamix,
start=list(shape=1,rate=1,mean=0,sd=1,prob=0.5))
cbind(tcc,ff$estimate,c(1.5,2,3,2,0.5))
fitdistr
gets the same result as mle2
, which suggests we're
in a local minimum. If we start from the true parameters we get
to something reasonable and near the true parameters.
ff2 <- fitdistr(r,dnormgammamix,
start=list(shape=1.5,rate=2,mean=3,sd=2,prob=0.5))
-logLik(ff2) ## 1725.994
-logLik(ff) ## 1755.458
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With