In R, when using lm()
, if I set na.action = na.pass
inside the call to lm()
, then in the summary table there is an NA for any coefficient that cannot be estimated (because of missing cells in this case).
If, however, I extract just the coefficients from the summary object, using either summary(myModel)$coefficients
or coef(summary(myModel))
, then the NA's are omitted.
I want the NA's to be included when I extract the coefficients the same way that they are included when I print the summary. Is there a way to do this?
Setting options(na.action = na.pass)
does not seem to help.
Here is an example:
> set.seed(534)
> myGroup1 <- factor(c("a","a","a","a","b","b"))
> myGroup2 <- factor(c("first","second","first","second","first","first"))
> myDepVar <- rnorm(6, 0, 1)
> myModel <- lm(myDepVar ~ myGroup1 + myGroup2 + myGroup1:myGroup2)
> summary(myModel)
Call:
lm(formula = myDepVar ~ myGroup1 + myGroup2 + myGroup1:myGroup2)
Residuals:
1 2 3 4 5 6
-0.05813 0.55323 0.05813 -0.55323 -0.12192 0.12192
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15150 0.23249 -0.652 0.561
myGroup11 0.03927 0.23249 0.169 0.877
myGroup21 -0.37273 0.23249 -1.603 0.207
myGroup11:myGroup21 NA NA NA NA
Residual standard error: 0.465 on 3 degrees of freedom
Multiple R-squared: 0.5605, Adjusted R-squared: 0.2675
F-statistic: 1.913 on 2 and 3 DF, p-value: 0.2914
> coef(summary(myModel))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15149826 0.2324894 -0.6516352 0.5611052
myGroup11 0.03926774 0.2324894 0.1689012 0.8766203
myGroup21 -0.37273117 0.2324894 -1.6032180 0.2072173
> summary(myModel)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15149826 0.2324894 -0.6516352 0.5611052
myGroup11 0.03926774 0.2324894 0.1689012 0.8766203
myGroup21 -0.37273117 0.2324894 -1.6032180 0.2072173
NA as a coefficient in a regression indicates that the variable in question is linearly related to the other variables. In your case, this means that Q3=a×Q1+b×Q2+c for some a,b,c. If this is the case, then there's no unique solution to the regression without dropping one of the variables.
Value. The function summary.lm computes and returns a list of summary statistics of the fitted linear model given in object , using the components (list elements) "call" and "terms" from its argument, plus. residuals.
The na. omit() function returns a list without any rows that contain na values. It will drop rows with na value / nan values. This is the fastest way to remove na rows in the R programming language.
Why don't you just extract the coefficients from the fitted model:
> coef(myModel)
(Intercept) myGroup1b
-0.48496169 -0.07853547
myGroup2second myGroup1b:myGroup2second
0.74546233 NA
That seems the easiest option.
na.action
has nothing to do with this. Note that you didn't pass na.action = na.pass
in your example.
na.action
is a global option for handling NA
in the data passed to a model fit, usually in conjunction with a formula; it is also the name of a function na.action()
. R builds up the so called model frame from the data
argument and the symbolic representation of the model expressed in the formula. At this point, any NA
would be detected and the default option for na.action
is to use na.omit()
to remove the NA
from the data by dropping samples with NA
for any variable. There are alternatives, most usefully na.exclude()
, which would remove NA
during fitting but add back NA
in the correct places in the fitted values, residuals etc. Read ?na.omit
and ?na.action
for more, plus ?options
for more on this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With