In "Elements of Statistical Learning" by Tibshirani, when comparing least squares/linear models and knn these 2 scenarios are stated:
Scenario 1: The training data in each class were generated from bivariate Gaussian distributions with uncorrelated components and different means.
Scenario 2: The training data in each class came from a mixture of 10 low- variance Gaussian distributions, with individual means themselves distributed as Gaussian.
The idea is that the first is better suited for least squares/linear models and the second for knn like models (those with higher variance from what i understand since knn takes into account the closest points and not all points).
In R, how would I simulate data for both scenarios?
The end goal is to be able to reproduce both scenarios in order to prove that effectively the 1st one is better explained by the linear model than the 2nd one.
Thanks!
This could be scenario 1
library(mvtnorm)
N1 = 50
N2 = 50
K = 2
mu1 = c(-1,3)
mu2 = c(2,0)
cov1 = 0
v11 = 2
v12 = 2
Sigma1 = matrix(c(v11,cov1,cov1,v12),nrow=2)
cov2 = 0
v21 = 2
v22 = 2
Sigma2 = matrix(c(v21,cov2,cov2,v22),nrow=2)
x1 = rmvnorm(N1,mu1,Sigma1)
x2 = rmvnorm(N2,mu2,Sigma2)
This could be a candidate for simulating from a Gaussian mixture:
BartSimpson <- function(x,n = 100){
means <- as.matrix(sort(rnorm(10)))
dens <- .1*rowSums(apply(means,1,dnorm,x=x,sd=.1))
rBartSimpson <- c(apply(means,1,rnorm,n=n/10,sd=.1))
return(list("thedensity" = dens,"draws" = rBartSimpson))
}
x <- seq(-5,5,by=.01)
plot(x,BartSimpson(x)$thedensity,type="l",lwd=4,col="yellow2",xlim=c(-4,4),ylim=c(0,0.6))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With