R : function to generate a mixture distribution

Tags:

I need to generate samples from a mixed distribution

40% samples come from Gaussian(mean=2,sd=8)
20% samples come from Cauchy(location=25,scale=2)
40% samples come from Gaussian(mean = 10, sd=6)

To do this, i wrote the following function :

dmix <- function(x){
prob <- (0.4 * dnorm(x,mean=2,sd=8)) + (0.2 * dcauchy(x,location=25,scale=2)) + (0.4 * dnorm(x,mean=10,sd=6))
return (prob)
}

And then tested with:

foo = seq(-5,5,by = 0.01)
vector = NULL
for (i in 1:1000){
vector[i] <- dmix(foo[i])
}
hist(vector)

I'm getting a histogram like this (which I know is wrong) - histogram of the supposed distribution

What am I doing wrong? Can anyone give some pointers please?

548

asked May 05 '14 19:05

Raaj

2 Answers

There are of course other ways to do this, but the distr package makes it pretty darned simple. (See also this answer for another example and some more details about distr and friends).

library(distr)

## Construct the distribution object.
myMix <- UnivarMixingDistribution(Norm(mean=2, sd=8), 
                                  Cauchy(location=25, scale=2),
                                  Norm(mean=10, sd=6),
                                  mixCoeff=c(0.4, 0.2, 0.4))
## ... and then a function for sampling random variates from it
rmyMix <- r(myMix)

## Sample a million random variates, and plot (part of) their histogram
x <- rmyMix(1e6)
hist(x[x>-100 & x<100], breaks=100, col="grey", main="")

enter image description here

And if you'd just like a direct look at your mixture distribution's pdf, do:

plot(myMix, to.draw.arg="d")

enter image description here

113

answered Sep 19 '22 10:09

Josh O'Brien

Always use R vectorization, if you can. Even if many values are actually discarded, it's often more efficient. (at least faster than the previous solution, and avoids extra - libraries)

rmy_ve = function(n){

##generation of (n x 3) matrix. 
##Each column is a random sample of size n from a single component of the mixture
temp = cbind(rnorm(n,2,8),rcauchy(n,25,2),rnorm(n,10,6))

##random generation of the indices
id = sample(1:3,n,rep = T,prob = c(.4,.2,.4))  
id = cbind(1:n,id)
temp[id]
}


> microbenchmark(rmy_ve(1e6),rmyMix(1e6))
Unit: milliseconds
       expr     min       lq     mean   median       uq      max    neval
rmy_ve(1e+06) 241.904 248.7528 272.9119 260.8752 298.1126 322.7429   100
rmyMix(1e+06) 270.917 322.3627 341.4779 329.1706 364.3947 561.2608   100

answered Sep 22 '22 10:09

Meme

Related questions
                            
                                Attractive 3D plot in R
                            
                                Rewriting slow R function in C++ & Rcpp
                            
                                ggplot2 mapping county boundries in one color and state boundries in another on the same map
                            
                                create templates using ggplot2 syntax?
                            
                                Use of offset in lm regression - R
                            
                                Connect R and Vertica using RODBC
                            
                                Remove consecutive duplicate entries
                            
                                Make a boxplot without whiskers
                            
                                What is the fastest way to obtain frequencies of integers in a vector?
                            
                                Storing results of loop iterations in R
                            
                                converting numbers to time
                            
                                R ave by columns
                            
                                glmer - predict with binomial data (cbind count data)
                            
                                Import text file using ff package
                            
                                How to produce a meaningful draftsman/correlation plot for discrete values
                            
                                element as the list names and list name as the element in a list?
                            
                                Imputing missing values linearly in R
                            
                                How can I find the index of all NA in a dataframe column?
                            
                                R 3.0.3 rbind multiple csv files
                            
                                Why do variable lookups in the body of function A take values from the global environment but not function B that calls A?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R : function to generate a mixture distribution

Tags:

r

probability

markov-chains

mcmc

mixture-model

Raaj

People also ask

2 Answers

Josh O'Brien

Meme

Recent Activity

Donate For Us