I have a dataframe with let's say N+2 columns. The first is just dates (mainly used for plotting later on), the second is a variable whose response to the remaining N columns I would like to compute. I'm thinking there must be something like
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y~df[,2:3],data=df)
This doesn't work. I've also tried and failed with
fit = lm(y~sapply(colnames(df)[2:3],as.name),data=df)
Any thoughts?
The lm() function is used to fit linear models to data frames in the R Language. It can be used to carry out regression, single stratum analysis of variance, and analysis of covariance to predict the value corresponding to data that is not in the data frame.
The order is not important for the summary of the linear model (which is based on t-tests that don't change). You can see this in your output which is the same. Note the different p-values for the factors b and c.
Using the formula notation y ~ .
specifies that you want to regress y on all of the other variables in the dataset.
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
# fits a model using x1 and x2
fit <- lm(y ~ ., data = df)
# Removes the column containing x1 so regression on x2 only
fit <- lm(y ~ ., data = df[, -2])
There is an alternative to Dason's answer, for when you want to specify the columns, to exclude, by name. It is to use subset()
, and specify the select
argument:
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y ~ ., data = subset(df, select=-x1))
Trying to use data[,-c("x1")]
fails with "invalid argument to unary operator".
It can extend to excluding multiple columns: subset(df, select = -c(x1,x2))
And you can still use numeric columns:
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y ~ ., data = subset(df, select = -2))
(That is equivalent to subset(df, select=-x1)
because x1
is the 2nd column.)
Naturally you can also use this to specify the columns to include.
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y ~ ., data = subset(df, select=c(y,x2)) )
(Yes, that is equivalent to lm(y ~ x2, df)
but is distinct if you were then going to be using step()
, for instance.)
I am fairly new to R, but I found another way to do this for named columns in a data frame. Say you want to run regression using all columns except for column x2
, then you'll write:
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
# Removes the column containing x2 so regression on x1 only
model <- lm(Y ~ . - x2, data = df)
# to remove more columns (assuming there were more columns in the data frame)
model <- lm(Y ~ . - x2 - x3 - x4, data = df)
The rest of the answers are pretty old, so maybe it's a new feature, but it's pretty neat!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With