linear model when all occurrences of independent variables are NA

Question

I'm looking for suggestions on how to deal with NA's in linear regressions when all occurrences of an independent/explanatory variable are NA (i.e. x3 below).

I know the obvious solution would be to exclude the independent/explanatory variable in question from the model but I am looping through multiple regions and would prefer not to have a different functional forms for each region.

Below is some sample data:

set.seed(23409)
n <- 100

time <- seq(1,n, 1)
x1 <- cumsum(runif(n))           
y  <- .8*x1 + rnorm(n, mean=0, sd=2)
x2 <- seq(1,n, 1)       
x3 <- rep(NA, n)            
df <- data.frame(y=y, time=time, x1=x1, x2=x2, x3=x3)

# Quick plot of data
library(ggplot2)
library(reshape2)
df.melt <-melt(df, id=c("time"))

p <- ggplot(df.melt, aes(x=time, y=value)) + 
  geom_line() + facet_grid(variable ~ .)
p

I have read the documentation for lm and tried various na.action settings without success:

lm(y~x1+x2+x3, data=df, singular.ok=TRUE)

lm(y~x1+x2+x3, data=df, na.action=na.omit)
lm(y~x1+x2+x3, data=df, na.action=na.exclude)

lm(y~x1+x2+x3, data=df, singular.ok=TRUE, na.exclude=na.omit)
lm(y~x1+x2+x3, data=df, singular.ok=TRUE, na.exclude=na.exclude)

Is there a way to get lm to run without error and simply return a coefficient for the explanatory reflective of the lack of explanatory power (i.e. either zero or NA) from the variable in question?

Jouni Helske · Accepted Answer

Here's one idea:

set.seed(23409)
n <- 100

time <- seq(1,n, 1)
x1 <- cumsum(runif(n))           
y  <- .8*x1 + rnorm(n, mean=0, sd=2)
x2 <- seq(1,n, 1)       
x3 <- rep(NA, n)            
df <- data.frame(y=y, time=time, x1=x1, x2=x2, x3=x3)

replaceNA<-function(x){
  if(all(is.na(x))){
    rep(0,length(x)) 
  } else x

} 

lm(y~x1+x2+x3, data= data.frame(lapply(df,replaceNA)))
Call:
lm(formula = y ~ x1 + x2 + x3, data = data.frame(lapply(df, replaceNA)))

Coefficients:
(Intercept)           x1           x2           x3  
    0.05467      1.01133     -0.10613           NA  

lm(y~x1+x2, data=df)
Call:
lm(formula = y ~ x1 + x2, data = df)

Coefficients:
(Intercept)           x1           x2  
    0.05467      1.01133     -0.10613

So you replace the variables which contain only NA's with variable which contains only 0's. you get the coefficient value NA, but all the relevant parts of the model fits are same (expect qr decomposition, but if information about that is needed, it can be easily modified). Note that component summary(fit)$alias (see ?alias) might be useful.

This seems to relate your other question: Replace lm coefficients in [r]

linear model when all occurrences of independent variables are NA

Tags:

dataframe

r

na

lm

MikeTP

1 Answers

Jouni Helske

Recent Activity

Donate For Us

linear model when all occurrences of independent variables are NA

Tags:

dataframe

r

na

lm

MikeTP

1 Answers

Jouni Helske

Related questions

Recent Activity

Donate For Us