Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate data for gaussian distributions in these 2 scenarios in R?

In "Elements of Statistical Learning" by Tibshirani, when comparing least squares/linear models and knn these 2 scenarios are stated:

Scenario 1: The training data in each class were generated from bivariate Gaussian distributions with uncorrelated components and different means.

Scenario 2: The training data in each class came from a mixture of 10 low- variance Gaussian distributions, with individual means themselves distributed as Gaussian.

The idea is that the first is better suited for least squares/linear models and the second for knn like models (those with higher variance from what i understand since knn takes into account the closest points and not all points).

In R, how would I simulate data for both scenarios?

The end goal is to be able to reproduce both scenarios in order to prove that effectively the 1st one is better explained by the linear model than the 2nd one.

Thanks!

like image 487
scc Avatar asked Oct 31 '22 20:10

scc


1 Answers

This could be scenario 1

library(mvtnorm)

N1 = 50
N2 = 50
K = 2

mu1 = c(-1,3)
mu2 = c(2,0)

cov1 = 0
v11 = 2
v12 = 2
Sigma1 = matrix(c(v11,cov1,cov1,v12),nrow=2)

cov2 = 0
v21 = 2
v22 = 2
Sigma2 = matrix(c(v21,cov2,cov2,v22),nrow=2)

x1 = rmvnorm(N1,mu1,Sigma1)
x2 = rmvnorm(N2,mu2,Sigma2)

This could be a candidate for simulating from a Gaussian mixture:

BartSimpson <- function(x,n = 100){ 
   means <- as.matrix(sort(rnorm(10)))
   dens <- .1*rowSums(apply(means,1,dnorm,x=x,sd=.1)) 
   rBartSimpson <- c(apply(means,1,rnorm,n=n/10,sd=.1))
   return(list("thedensity" = dens,"draws" = rBartSimpson))
}

x <- seq(-5,5,by=.01)

plot(x,BartSimpson(x)$thedensity,type="l",lwd=4,col="yellow2",xlim=c(-4,4),ylim=c(0,0.6))
like image 89
Christoph Hanck Avatar answered Nov 14 '22 09:11

Christoph Hanck