Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dropping variable in lm formula still triggers contrast error

I'm trying to run lm() on only a subset of my data, and running into an issue.

dt = data.table(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100), x3 = as.factor(c(rep('men',50), rep('women',50)))) # sample data

lm( y ~ ., dt) # Use all x: Works
lm( y ~ ., dt[x3 == 'men']) # Use all x, limit to men: doesn't work (as expected)

The above doesn't work because the dataset now has only men, and we therefore can't include x3, the gender variable, into the model. BUT...

lm( y ~ . -x3, dt[x3 == 'men']) # Exclude x3, limit to men: STILL doesn't work
lm( y ~ x1 + x2, dt[x3 == 'men']) # Exclude x3, with different notation: works great

This is an issue with the "minus sign" notation in the formula? Please advice. Note: Of course I can do it a different way; for example, I could exclude the variables prior to putting them into lm(). But I'm teaching a class on this stuff, and I don't want to confuse the students, having already told them they can exclude variable using a minus sign in the formula.

like image 488
Zhaochen He Avatar asked Feb 12 '20 23:02

Zhaochen He


Video Answer


1 Answers

The error you are getting is because x3 is in the model with only one value = "men" (see comment below from @Artem Sokolov)

One way to solve it is to subset ahead of time:

dt = data.table(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100), x3 = as.factor(c(rep('men',50), rep('women',50)))) # sample data

dmen<-dt[x3 == 'men'] # create a new subsetted dataset with just men

lm( y ~ ., dmen[,-"x3"]) # now drop the x3 column from the dataset (just for the model)

Or you can do both in the same step:

lm( y ~ ., dt[x3 == 'men',-"x3"])
like image 183
Dylan_Gomes Avatar answered Sep 17 '22 09:09

Dylan_Gomes