Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with NA in a panel data regression?

I am trying to predict fitted values over data containing NAs, and based on a model generated by plm. Here's some sample code:

require(plm)
test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1), 
   y=c(1,3,5,10,8), x=c(1, NA, 3,4,5))
model <- plm(y ~ x, data=test.data, index=c("id", "time"), 
       model="pooling", na.action=na.exclude)
yhat <- predict(model, test.data, na.action=na.pass)
test.data$yhat <- yhat

When I run the last line I get an error stating that the replacement has 4 rows while data has 5 rows.

I have no idea how to get predict return a vector of length 5...

If instead of running a plm I run an lm (as in the line below) I get the expected result.

model <- lm(y ~ x, data=test.data, na.action=na.exclude)
like image 576
Rodrigo Avatar asked Jan 20 '13 18:01

Rodrigo


People also ask

How do you deal with missing values in regression?

Simple approaches include taking the average of the column and use that value, or if there is a heavy skew the median might be better. A better approach, you can perform regression or nearest neighbor imputation on the column to predict the missing values. Then continue on with your analysis/model.

How would you handle missing or inaccurate data?

When dealing with missing data, data scientists can use two primary methods to solve the error: imputation or the removal of data. The imputation method develops reasonable guesses for missing data. It's most useful when the percentage of missing data is low.

Can linear regression handle missing values?

Linear RegressionThe variable with missing data is used as the dependent variable. Cases with complete data for the predictor variables are used to generate the regression equation; the equation is then used to predict missing values for incomplete cases.

What will you do with a missing value in an observation?

Explanation: One of the most widely used imputation methods in such a case is the last observation carried forward (LOCF). This method replaces every missing value with the last observed value from the same subject. Whenever a value is missing, it is replaced with the last observed value [12].


1 Answers

I think this is something that predict.plm ought to handle for you -- seems like an oversight on the package authors' part -- but you can use ?napredict to implement it for yourself:

 pp <- predict(model, test.data)
 na.stuff <- attr(model$model,"na.action")
 (yhat <- napredict(na.stuff,pp))
 ## [1] 1.371429       NA 5.485714 7.542857 9.600000
like image 170
Ben Bolker Avatar answered Nov 14 '22 09:11

Ben Bolker