Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

linear model when all occurrences of independent variables are NA

Tags:

dataframe

r

na

lm

I'm looking for suggestions on how to deal with NA's in linear regressions when all occurrences of an independent/explanatory variable are NA (i.e. x3 below).

I know the obvious solution would be to exclude the independent/explanatory variable in question from the model but I am looping through multiple regions and would prefer not to have a different functional forms for each region.

Below is some sample data:

set.seed(23409)
n <- 100

time <- seq(1,n, 1)
x1 <- cumsum(runif(n))           
y  <- .8*x1 + rnorm(n, mean=0, sd=2)
x2 <- seq(1,n, 1)       
x3 <- rep(NA, n)            
df <- data.frame(y=y, time=time, x1=x1, x2=x2, x3=x3)

# Quick plot of data
library(ggplot2)
library(reshape2)
df.melt <-melt(df, id=c("time"))

p <- ggplot(df.melt, aes(x=time, y=value)) + 
  geom_line() + facet_grid(variable ~ .)
p

I have read the documentation for lm and tried various na.action settings without success:

lm(y~x1+x2+x3, data=df, singular.ok=TRUE)

lm(y~x1+x2+x3, data=df, na.action=na.omit)
lm(y~x1+x2+x3, data=df, na.action=na.exclude)

lm(y~x1+x2+x3, data=df, singular.ok=TRUE, na.exclude=na.omit)
lm(y~x1+x2+x3, data=df, singular.ok=TRUE, na.exclude=na.exclude)

Is there a way to get lm to run without error and simply return a coefficient for the explanatory reflective of the lack of explanatory power (i.e. either zero or NA) from the variable in question?

like image 850
MikeTP Avatar asked Mar 13 '13 22:03

MikeTP


1 Answers

Here's one idea:

set.seed(23409)
n <- 100

time <- seq(1,n, 1)
x1 <- cumsum(runif(n))           
y  <- .8*x1 + rnorm(n, mean=0, sd=2)
x2 <- seq(1,n, 1)       
x3 <- rep(NA, n)            
df <- data.frame(y=y, time=time, x1=x1, x2=x2, x3=x3)

replaceNA<-function(x){
  if(all(is.na(x))){
    rep(0,length(x)) 
  } else x

} 

lm(y~x1+x2+x3, data= data.frame(lapply(df,replaceNA)))
Call:
lm(formula = y ~ x1 + x2 + x3, data = data.frame(lapply(df, replaceNA)))

Coefficients:
(Intercept)           x1           x2           x3  
    0.05467      1.01133     -0.10613           NA  

lm(y~x1+x2, data=df)
Call:
lm(formula = y ~ x1 + x2, data = df)

Coefficients:
(Intercept)           x1           x2  
    0.05467      1.01133     -0.10613 

So you replace the variables which contain only NA's with variable which contains only 0's. you get the coefficient value NA, but all the relevant parts of the model fits are same (expect qr decomposition, but if information about that is needed, it can be easily modified). Note that component summary(fit)$alias (see ?alias) might be useful.

This seems to relate your other question: Replace lm coefficients in [r]

like image 164
Jouni Helske Avatar answered Sep 18 '22 00:09

Jouni Helske