Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different NA actions for coefficients and summary of linear model in R

Tags:

r

na

lm

summary

In R, when using lm(), if I set na.action = na.pass inside the call to lm(), then in the summary table there is an NA for any coefficient that cannot be estimated (because of missing cells in this case).

If, however, I extract just the coefficients from the summary object, using either summary(myModel)$coefficients or coef(summary(myModel)), then the NA's are omitted.

I want the NA's to be included when I extract the coefficients the same way that they are included when I print the summary. Is there a way to do this?

Setting options(na.action = na.pass) does not seem to help.

Here is an example:

> set.seed(534)
> myGroup1 <- factor(c("a","a","a","a","b","b"))
> myGroup2 <- factor(c("first","second","first","second","first","first"))
> myDepVar <- rnorm(6, 0, 1)
> myModel <- lm(myDepVar ~ myGroup1 + myGroup2 + myGroup1:myGroup2)
> summary(myModel)

Call:
lm(formula = myDepVar ~ myGroup1 + myGroup2 + myGroup1:myGroup2)

Residuals:
       1        2        3        4        5        6 
-0.05813  0.55323  0.05813 -0.55323 -0.12192  0.12192 

Coefficients: (1 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)         -0.15150    0.23249  -0.652    0.561
myGroup11            0.03927    0.23249   0.169    0.877
myGroup21           -0.37273    0.23249  -1.603    0.207
myGroup11:myGroup21       NA         NA      NA       NA

Residual standard error: 0.465 on 3 degrees of freedom
Multiple R-squared: 0.5605,     Adjusted R-squared: 0.2675 
F-statistic: 1.913 on 2 and 3 DF,  p-value: 0.2914 

> coef(summary(myModel))
               Estimate Std. Error    t value  Pr(>|t|)
(Intercept) -0.15149826  0.2324894 -0.6516352 0.5611052
myGroup11    0.03926774  0.2324894  0.1689012 0.8766203
myGroup21   -0.37273117  0.2324894 -1.6032180 0.2072173

> summary(myModel)$coefficients
               Estimate Std. Error    t value  Pr(>|t|)
(Intercept) -0.15149826  0.2324894 -0.6516352 0.5611052
myGroup11    0.03926774  0.2324894  0.1689012 0.8766203
myGroup21   -0.37273117  0.2324894 -1.6032180 0.2072173
like image 934
Jdub Avatar asked Jun 07 '12 14:06

Jdub


People also ask

What are the NAS for the coefficients mean?

NA as a coefficient in a regression indicates that the variable in question is linearly related to the other variables. In your case, this means that Q3=a×Q1+b×Q2+c for some a,b,c. If this is the case, then there's no unique solution to the regression without dropping one of the variables.

What does summary of lm return in R?

Value. The function summary.lm computes and returns a list of summary statistics of the fitted linear model given in object , using the components (list elements) "call" and "terms" from its argument, plus. residuals.

How do I remove Na from regression in R?

The na. omit() function returns a list without any rows that contain na values. It will drop rows with na value / nan values. This is the fastest way to remove na rows in the R programming language.


1 Answers

Why don't you just extract the coefficients from the fitted model:

> coef(myModel)
             (Intercept)                myGroup1b 
             -0.48496169              -0.07853547 
          myGroup2second myGroup1b:myGroup2second 
              0.74546233                       NA

That seems the easiest option.

na.action has nothing to do with this. Note that you didn't pass na.action = na.pass in your example.

na.action is a global option for handling NA in the data passed to a model fit, usually in conjunction with a formula; it is also the name of a function na.action(). R builds up the so called model frame from the data argument and the symbolic representation of the model expressed in the formula. At this point, any NA would be detected and the default option for na.action is to use na.omit() to remove the NA from the data by dropping samples with NA for any variable. There are alternatives, most usefully na.exclude(), which would remove NA during fitting but add back NA in the correct places in the fitted values, residuals etc. Read ?na.omit and ?na.action for more, plus ?options for more on this.

like image 82
Gavin Simpson Avatar answered Oct 15 '22 06:10

Gavin Simpson