Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple regression leave out one variable (column)

Where there are many columns in data frame and you want to just leave out one or two columns and include everything else in the multiple regression, How can we accomplish that without writing out a large formula?

for example to include all:

lm(y ~., data=myFrame)

Then if you want to handpick one by one

lm(y ~ x1 + x2 + x3)

but if you have 50 variables but want to leave out few what's the best way? Because I want to leave out two or three, include all the rest,and then do forward and backward selection.

like image 910
add-semi-colons Avatar asked Apr 01 '16 16:04

add-semi-colons


People also ask

What is a dummy variable in multiple regression?

In statistics and econometrics, particularly in regression analysis, a dummy variable is one that takes only the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.

What will happen if one puts unnecessary variables in a linear regression?

Omitting confounding variables from your regression model can bias the coefficient estimates. What does that mean exactly? When you're assessing the effects of the independent variables in the regression output, this bias can produce the following problems: Overestimate the strength of an effect.

What is a dummy variable in regression?

A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. In research design, a dummy variable is often used to distinguish different treatment groups.

How do you create a dummy variable in linear regression?

There are two steps to successfully set up dummy variables in a multiple regression: (1) create dummy variables that represent the categories of your categorical independent variable; and (2) enter values into these dummy variables – known as dummy coding – to represent the categories of the categorical independent ...


1 Answers

Use the . operator for "everything in the data frame except the response variable" and the - operator for "but leave these out" ...

lm(y ~ . - excluded_1 - excluded_2, data = myFrame)
like image 107
Ben Bolker Avatar answered Oct 04 '22 00:10

Ben Bolker