Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way in R to pick which level is the base category for a factor in an lm regression

Tags:

r

r-factor

lm

Suppose I want to run a regression using lm and a factor as a right hand side variable. What is the best way to choose which level in the factor is the base category (the one that is excluded to avoid multicollinearity). Note that I am not interested in excluding the intercept because I have many factors.

I would also like a formula-based solution, not one that acts on the data.frame directly, although if you think you have a really good solution for that, please post it as well.

My solution is:

base_cat <- function(x) c(x,1:(x-1),(x+1):100) 
a_reg <- lm(y ~ x1 + x2 + factor(x3, levels=base_cat(30)) #suppose that x3 has draws from the integers 1 to 100.

The left out category by lm is the first level in the factor so this just reorders the levels so that the one specified in base_cat() is the first one, and puts the rest after.

Any other ideas?

like image 386
Xu Wang Avatar asked Oct 19 '11 21:10

Xu Wang


People also ask

How do you find the level of a factor variable in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

How do you specify reference levels in R?

To specify the manual reference factor level in the R Language, we will use the relevel() function. The relevel() function is used to reorder the factor vector so that the level specified by the user is first and others are moved down.

How do you choose the best variables for a linear regression?

When building a linear or logistic regression model, you should consider including: Variables that are already proven in the literature to be related to the outcome. Variables that can either be considered the cause of the exposure, the outcome, or both. Interaction terms of variables that have large main effects.


1 Answers

The function relevel does precisely this. You pass it an unordered factor and the name of the reference level and it returns a factor with that level as the first one.

like image 190
joran Avatar answered Nov 04 '22 08:11

joran