When simulating multivariate data for regression, how can I set the R-squared (example code included)?

Tags:

r

I am trying to simulate a three-variable dataset so that I can run linear regression models on it. 'X1' and 'X2' would be continuous independent variables (mean=0, sd=1), and 'Y' would be the continuous dependent variable.

The variables will be regression model will produce coefficients like this: Y = 5 + 3(X1) - 2(X2)

I would like to simulate this dataset such that the resulting regression model has an R-squared value of 0.2. How can I determine the value of 'sd.value' so that the regression model has this R-squared?

n <- 200 
set.seed(101) 
sd.value <- 1

X1 <- rnorm(n, 0, 1)
X2 <- rnorm(n, 0, 1)
Y <- rnorm(n, (5 + 3*X1 - 2*X2), sd.value)

simdata <- data.frame(X1, X2, Y)

summary(lm(Y ~ X1 + X2, data=simdata))

587

asked Sep 30 '13 14:09

Slyron

1 Answers

Take a look at this code, it should be close enough to what you want:

simulate <- function(n.obs=10^4, beta=c(5, 3, -2), R.sq=0.8) {
    stopifnot(length(beta) == 3)
    df <- data.frame(x1=rnorm(n.obs), x2=rnorm(n.obs))  # x1 and x2 are independent
    var.epsilon <- (beta[2]^2 + beta[3]^2) * (1 - R.sq) / R.sq
    stopifnot(var.epsilon > 0)
    df$epsilon <- rnorm(n.obs, sd=sqrt(var.epsilon))
    df$y <- with(df, beta[1] + beta[2]*x1 + beta[3]*x2 + epsilon)
    return(df)
}
get.R.sq <- function(desired) {
    model <- lm(y ~ x1 + x2, data=simulate(R.sq=desired))
    return(summary(model)$r.squared)
}
df <- data.frame(desired.R.sq=seq(from=0.05, to=0.95, by=0.05))
df$actual.R.sq <- sapply(df$desired.R.sq, FUN=get.R.sq)
plot(df)
abline(a=0, b=1, col="red", lty=2)

Basically your question comes down to figuring out the expression for var.epsilon. Since we have y = b1 + b2*x1 + b3*x2 + epsilon, and Xs and epsilon are all independent, we have var[y] = b2^2 * var[x1] + b3^2 * var[x2] + var[eps], where the var[Xs]=1 by assumption. You can then solve for var[eps] as a function of R-squared.

190

answered Sep 17 '22 08:09

Adrian

Related questions
                            
                                How to simulate pink noise in R
                            
                                Does anyone know of an R code to calculate Palmer Drought Severity Index (PDSI)? [closed]
                            
                                plotting email flow in map using R
                            
                                What is the idiomatic way to check and document function preconditions and postconditions in R?
                            
                                Translation of R script using gettext
                            
                                ggmap with geom_map superimposed
                            
                                read an MSWord file into R
                            
                                Index iteration idiom
                            
                                Faster Way of Calculating Rolling Realized Volatility in R
                            
                                Valid expressions to the quote function
                            
                                Scope -- how to evaluate an object in parent environment if local var with same name exists?
                            
                                Making a vector from list elements in R
                            
                                Use object names as list names in R
                            
                                ggplot2: how to show the legend [duplicate]
                            
                                reading in a text file with a SUB (1a) (Control-Z) character in R on Windows
                            
                                Errors while trying to fit gamma distribution with R fitdistr{MASS}
                            
                                Error when compiling pdf using knitr in rstudio
                            
                                x axis and y axis labels in pheatmap in R
                            
                                How data.table sorts strings when setting key
                            
                                How to get list of packages used in a knitr .Rnw document?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With