I'm trying to create (in r) the equivalent to the following MATLAB function that will generate n samples from a mixture of N(m1,(s1)^2) and N(m2, (s2)^2) with a fraction, alpha, from the first Gaussian. I have a start, but the results are notably different between MATLAB and R (i.e., the MATLAB results give occasional values of +-8 but the R version never even gives a value of +-5). Please help me sort out what is wrong here. Thanks :-) For Example: Plot 1000 samples from a mix of N(0,1) and N(0,36) with 95% of samples from the first Gaussian. Normalize the samples to mean zero and standard deviation one. MATLAB function <pre class="prettyprint"><code>function y = gaussmix(n,m1,m2,s1,s2,alpha) y = zeros(n,1); U = rand(n,1); I = (U < alpha) y = I.*(randn(n,1)*s1+m1) + (1-I).*(randn(n,1)*s2 + m2); </code></pre> implementation <pre class="prettyprint"><code>P = gaussmix(1000,0,0,1,6,.95) P = (P-mean(P))/std(P) plot(P) axis([0 1000 -15 15]) hist(P) axis([-15 15 0 1000]) </code></pre> resulting plot <img src="https://i.stack.imgur.com/KGDfo.jpg" alt="plot of randomly generated samples from two Gaussian distributions in MATLAB"> resulting hist <img src="https://i.stack.imgur.com/AtkUy.jpg" alt="histogram of randomly generated samples from two Gaussian distributions in MATLAB"> R <pre class="prettyprint"><code>yn <- rbinom(1000, 1, .95) s <- rnorm(1000, 0 + 0*yn, 1 + 36*yn) sn <- (s-mean(s))/sd(s) plot(sn, xlim=range(0,1000), ylim=range(-15,15)) hist(sn, xlim=range(-15,15), ylim=range(0,1000)) </code></pre> resulting plot <img src="https://i.stack.imgur.com/eb2AE.jpg" alt="plot of randomly generated samples from two Gaussian distributions in R"> resulting hist <img src="https://i.stack.imgur.com/W8ODg.jpg" alt="histogram of randomly generated samples from two Gaussian distributions in R"> As always, THANK YOU! SOLUTION <pre class="prettyprint"><code>gaussmix <- function(nsim,mean_1,mean_2,std_1,std_2,alpha){ U <- runif(nsim) I <- as.numeric(U<alpha) y <- I*rnorm(nsim,mean=mean_1,sd=std_1)+ (1-I)*rnorm(nsim,mean=mean_2,sd=std_2) return(y) } z1 <- gaussmix(1000,0,0,1,6,0.95) z1_standardized <- (z1-mean(z1))/sqrt(var(z1)) z2 <- gaussmix(1000,0,3,1,1,0.80) z2_standardized <- (z2-mean(z2))/sqrt(var(z2)) z3 <- rlnorm(1000) z3_standardized <- (z3-mean(z3))/sqrt(var(z3)) par(mfrow=c(2,3)) hist(z1_standardized,xlim=c(-10,10),ylim=c(0,500), main="Histogram of 95% of N(0,1) and 5% of N(0,36)", col="blue",xlab=" ") hist(z2_standardized,xlim=c(-10,10),ylim=c(0,500), main="Histogram of 80% of N(0,1) and 10% of N(3,1)", col="blue",xlab=" ") hist(z3_standardized,xlim=c(-10,10),ylim=c(0,500), main="Histogram of samples of LN(0,1)",col="blue",xlab=" ") ## plot(z1_standardized,type='l', main="1000 samples from a mixture N(0,1) and N(0,36)", col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10)) plot(z2_standardized,type='l', main="1000 samples from a mixture N(0,1) and N(3,1)", col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10)) plot(z3_standardized,type='l', main="1000 samples from LN(0,1)", col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10)) </code></pre>

Not that you asked for it, but the <code>mclust</code> package offers a way to generalize your problem to more dimensions and diverse covariance structures. See <code>?mclust::sim</code>. The example task would be done this way: <pre class="prettyprint"><code>require(mclust) simdata = sim(modelName = "V", parameters = list(pro = c(0.95, 0.05), mean = c(0, 0), variance = list(modelName = "V", d = 1, G = 2, sigmasq = c(0, 36))), n = 1000) plot(scale(simdata[,2]), type = "h") </code></pre>

Generating samples from a two-Gaussian mixture in r (code given in MATLAB)

Tags:

plot

r

matlab

sample

gaussian

I'm trying to create (in r) the equivalent to the following MATLAB function that will generate n samples from a mixture of N(m1,(s1)^2) and N(m2, (s2)^2) with a fraction, alpha, from the first Gaussian.

I have a start, but the results are notably different between MATLAB and R (i.e., the MATLAB results give occasional values of +-8 but the R version never even gives a value of +-5). Please help me sort out what is wrong here. Thanks :-)

For Example: Plot 1000 samples from a mix of N(0,1) and N(0,36) with 95% of samples from the first Gaussian. Normalize the samples to mean zero and standard deviation one.

MATLAB

function

function y = gaussmix(n,m1,m2,s1,s2,alpha)
y = zeros(n,1);
U = rand(n,1);
I = (U < alpha)
y = I.*(randn(n,1)*s1+m1) + (1-I).*(randn(n,1)*s2 + m2);

implementation

P = gaussmix(1000,0,0,1,6,.95)
P = (P-mean(P))/std(P)
plot(P)
axis([0 1000 -15 15])
hist(P)
axis([-15 15 0 1000])

resulting plot

plot of randomly generated samples from two Gaussian distributions in MATLAB

resulting hist

histogram of randomly generated samples from two Gaussian distributions in MATLAB

yn <- rbinom(1000, 1, .95)
s <- rnorm(1000, 0 + 0*yn, 1 + 36*yn)
sn <- (s-mean(s))/sd(s)
plot(sn, xlim=range(0,1000), ylim=range(-15,15))
hist(sn, xlim=range(-15,15), ylim=range(0,1000))

resulting plot

plot of randomly generated samples from two Gaussian distributions in R

resulting hist

histogram of randomly generated samples from two Gaussian distributions in R

As always, THANK YOU!

SOLUTION

gaussmix <- function(nsim,mean_1,mean_2,std_1,std_2,alpha){
   U <- runif(nsim)
   I <- as.numeric(U<alpha)
   y <- I*rnorm(nsim,mean=mean_1,sd=std_1)+
       (1-I)*rnorm(nsim,mean=mean_2,sd=std_2)
   return(y)
}

z1 <- gaussmix(1000,0,0,1,6,0.95)
z1_standardized <- (z1-mean(z1))/sqrt(var(z1))
z2 <- gaussmix(1000,0,3,1,1,0.80)
z2_standardized <- (z2-mean(z2))/sqrt(var(z2))
z3 <- rlnorm(1000)
z3_standardized <- (z3-mean(z3))/sqrt(var(z3))

par(mfrow=c(2,3))
hist(z1_standardized,xlim=c(-10,10),ylim=c(0,500),
   main="Histogram of 95% of N(0,1) and 5% of N(0,36)",
   col="blue",xlab=" ")
hist(z2_standardized,xlim=c(-10,10),ylim=c(0,500),
   main="Histogram of 80% of N(0,1) and 10% of N(3,1)",
   col="blue",xlab=" ")
hist(z3_standardized,xlim=c(-10,10),ylim=c(0,500),
   main="Histogram of samples of LN(0,1)",col="blue",xlab=" ")
##
plot(z1_standardized,type='l',
   main="1000 samples from a mixture N(0,1) and N(0,36)",
   col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10))
plot(z2_standardized,type='l',
   main="1000 samples from a mixture N(0,1) and N(3,1)",
   col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10))
plot(z3_standardized,type='l',
  main="1000 samples from LN(0,1)",
   col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10))

248

asked Sep 16 '12 19:09

ruya

2 Answers

There are two problems, I think ... (1) your R code is creating a mixture of normal distributions with standard deviations of 1 and 37. (2) By setting prob equal to alpha in your rbinom() call, you're getting a fraction alpha in the second mode rather than the first. So what you are getting is a distribution that is mostly a Gaussian with sd 37, contaminated by a 5% mixture of Gaussian with sd 1, rather than a Gaussian with sd 1 that is contaminated by a 5% mixture of a Gaussian with sd 6. Scaling by the standard deviation of the mixture (which is about 36.6) basically reduces it to a standard Gaussian with a slight bump near the origin ...

(The other answers posted here do solve your problem perfectly well, but I thought you might be interested in a diagnosis ...)

A more compact (and perhaps more idiomatic) version of your Matlab gaussmix function (I think runif(n)<alpha is slightly more efficient than rbinom(n,size=1,prob=alpha) )

gaussmix <- function(n,m1,m2,s1,s2,alpha) {
    I <- runif(n)<alpha
    rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2))
}
set.seed(1001)
s <- gaussmix(1000,0,0,1,6,0.95)

answered Sep 30 '22 12:09

Ben Bolker

Not that you asked for it, but the mclust package offers a way to generalize your problem to more dimensions and diverse covariance structures. See ?mclust::sim. The example task would be done this way:

require(mclust)
simdata = sim(modelName = "V",
              parameters = list(pro = c(0.95, 0.05),
                                mean = c(0, 0),
                                variance = list(modelName = "V", 
                                                d = 1, 
                                                G = 2,
                                                sigmasq = c(0, 36))),
              n = 1000)
plot(scale(simdata[,2]), type = "h")

answered Sep 30 '22 12:09

Zoë Clark

Related questions
                            
                                How can you visualize data frames in a good way?
                            
                                How to pass column names to a function that processes data.frames
                            
                                Vim for python and R scripting (in Screen or not)
                            
                                regarding data( ) function in R
                            
                                R: how to reorder legend key in ggplot2 line plot to match the final values in each series?
                            
                                Using multiple scale_colour_gradient scales for different ranges of the data in one plot
                            
                                predict.lm with newdata
                            
                                Read in specific, pattern-matched rows from a file
                            
                                ggplot2 faceted line plot has areas of the line filled with solid color, why?
                            
                                Choose one cell per row in data frame
                            
                                Place annotation at the top of a series of histograms in ggplot2 using a for loop
                            
                                How to populate matrix with values from another matrix in R?
                            
                                global comment option for R markdown in knitr
                            
                                Write output of R loop to file
                            
                                Recursive cumulative function
                            
                                Communication of parallel processes: what are my options?
                            
                                Passing variable with line types to ggplot linetype
                            
                                reading in large text files in r
                            
                                Creating a legend for a dendrogram with coloured leaves in r
                            
                                how can I normalize data frame values by the sum (get percents)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With