How do I deal with NAs in residuals in a regression in R?

Tags:

So I am having some issues with some NA values in the residuals of a lm cross sectional regression in R.

The issue isn't the NA values themselves, it's the way R presents them.

For example:

test$residuals
#          1          2          4          5 
#  0.2757677 -0.5772193 -5.3061303  4.5102816 
test$residuals[3]
#        4 
# -5.30613

In this simple example a NA value will make one of the residuals go missing. When I extract the residuals I can clearly see the third index missing. So far so good, no complaints here. The problem is that the corresponding numeric vector is now one item shorter so the third index is actually the fourth. How can I make R return these residuals instead, i.e., explicitly showing NA instead of skipping an index?

test$residuals
#          1          2          3          4          5 
#  0.2757677 -0.5772193         NA -5.3061303  4.5102816

I need to keep track of all individual residuals so it would make my life much easier if I could extract them this way instead.

242

asked Jul 30 '11 11:07

c00kiemonster

3 Answers

Yet another idea is to take advantage of the row names associated with the data frame provided as input to lm. In that case, the residuals should retain the names from the source data. Accessing the residuals from your example would give a value of -5.3061303 for test$residuals["4"] and NA for test$residuals["3"].

However, this does not exactly answer your question. One approach to doing exactly what you asked for in terms of getting the NA values back into the residuals is illustrated below:

> D<-data.frame(x=c(NA,2,3,4,5,6),y=c(2.1,3.2,4.9,5,6,7),residual=NA)
> Z<-lm(y~x,data=D)
> D[names(Z$residuals),"residual"]<-Z$residuals
> D
   x   y residual
1 NA 2.1       NA
2  2 3.2    -0.28
3  3 4.9     0.55
4  4 5.0    -0.22
5  5 6.0    -0.09
6  6 7.0     0.04

If you are doing predictions based on the regression results, you may want to specify na.action=na.exclude in lm. See the help results for na.omit for a discussion. Note that simply specifying na.exclude does not actually put the NA values back into the residuals vector itself.

As noted in a prior answer, resid (synonym for residuals) provides a generic access function in which the residuals will contain the desired NA values if na.exclude was specified in lm. Using resid is probably more general and a cleaner approach. In that case, the code for the above example would be changed to:

> D<-data.frame(x=c(NA,2,3,4,5,6),y=c(2.1,3.2,4.9,5,6,7),residual=NA)
> Z<-lm(y~x,data=D,na.action=na.exclude)
> D$residuals<-residuals(Z)

186

answered Oct 26 '22 11:10

J Baker

I just found this googling around a bit deeper. The resid function on a lm with na.action=na.exclude is the way to go.

answered Oct 26 '22 12:10

c00kiemonster

Here an illustrated strategy using a slightly modified example on the lm help page. This is a direct application of the definition of residual:

## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
# Two NA's introduced
weight <- c(4.17,5.58,NA,6.11,4.50,4.61,5.17,4.53,5.33,5.14,
 4.81,4.17,4.41,3.59,5.87,3.83,6.03,NA,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
lm.D9 <- lm(weight ~ group)
rr2 <- weight- predict(lm.D9, na.action=na.pass)
Warning message:
In weight - predict(lm.D9, na.action = na.pass) :
  longer object length is not a multiple of shorter object length
> rr2
 [1] -0.8455556  0.5644444         NA  1.0944444 -0.5155556 -0.4055556  0.1544444
 [8] -0.4855556  0.3144444  0.5044444  0.1744444 -0.4655556 -0.2255556 -1.0455556
[15]  1.2344444 -0.8055556  1.3944444         NA -0.6955556 -0.3255556

I think it would be dangerous to directly modify an lm object so that lm.D9$residual would return that result.

answered Oct 26 '22 11:10

IRTFM

Related questions
                            
                                Disabling buttons in Shiny
                            
                                R: What do you call the :: and ::: operators and how do they differ?
                            
                                Apply a dataframe of boolean values onto another dataframe in R
                            
                                R script in Power BI returns date as Microsoft.OleDb.Date
                            
                                using purrr to affect single columns of each dataframe in a list
                            
                                how to change the color in geom_point or lines in ggplot [duplicate]
                            
                                How to use the Google satellite view as tile in leaflet with R
                            
                                Saving ggplot graph to PDF with fonts embedded in r
                            
                                Rscript detect if R script is being called/sourced from another script
                            
                                Merge and Perfectly Align Histogram and Boxplot using ggplot2
                            
                                How to download entire repository from Github using R?
                            
                                dplyr rowwise sum and other functions like max
                            
                                Summing values in R based on column value with dplyr
                            
                                Reading shape file with sf::st_read fails to capture encoding UTF8
                            
                                How can we check if any 2 intervals of a unique ID overlaps?
                            
                                Choose variables based on name (simple regular expression)
                            
                                R code examples/best practices
                            
                                Merge a lot of data frames in R
                            
                                Loading someone else's .rdata file, can't access the data
                            
                                Aggregating values in a data frame based on key

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I deal with NAs in residuals in a regression in R?

Tags:

r

missing-data

regression

c00kiemonster

People also ask

3 Answers

J Baker

c00kiemonster

IRTFM

Recent Activity

Donate For Us