I apologize in advance if my question seems really simplistic or naive, but I'm trying to understand what the function simulate
does, conceptually (i.e., I'm interested in the logic of it, independently of whether it's applied to lm, lme, etc.)
Say I'm doing a simple multiple regression on the following data:
n <- 40
x1 <- rnorm(n, mean=3, sd=1)
x2 <- rnorm(n, mean=4, sd=1.25)
y <- 2*x1 + 3*x2 + rnorm(n, mean=2, sd=1)
mydata <- data.frame(x1, x2, y)
mod <- lm(y ~ x1 + x2, data=mydata)
What does the function simulate
do when applied to that case? So if I do:
simulate(mod, nsim=2)
What are the two vectors I obtain?
In essence, is it similar to doing:
replicate(2, y + rnorm(n=length(y), mean="some value", sd="some other value"))
If it is similar to that logic, then what would "some value" and "some other value" be? Would they be mean(mod$residuals)
and sd(mod$residuals)
? Or a permutation of actual residuals? Or something else entirely?
Or is it doing something completely different?
If anyone could explain/confirm how simulate
works in simple, non-technical terms, it would be greatly appreciated.
It basically does what the help file says: 'Simulate one or more responses from the distribution corresponding to a fitted model object.'
So for each simulation a random draw is taken from the conditional distribution of the outcome variable conditional on the covariates. This conditional distribution is a normal distribution by default in lm
. The standard deviation of this normal distribution corresponds to the sqrt of MSE of mod
.
The code below replicates the output (given you use the same seed):
set.seed(1)
head(simulate(mod, nsim=2))
set.seed(1)
for(i in 1:nsim) {
tmp <- predict(mod) + rnorm(length(predict(mod)), 0, summary(mod)$sigma)
res <- if (i==1) tmp else cbind(res, tmp)
}
head(res)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With