I am trying to predict fitted values over data containing NA
s, and based on a model generated by plm
. Here's some sample code:
require(plm)
test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1),
y=c(1,3,5,10,8), x=c(1, NA, 3,4,5))
model <- plm(y ~ x, data=test.data, index=c("id", "time"),
model="pooling", na.action=na.exclude)
yhat <- predict(model, test.data, na.action=na.pass)
test.data$yhat <- yhat
When I run the last line I get an error stating that the replacement has 4 rows while data has 5 rows.
I have no idea how to get predict return a vector of length 5...
If instead of running a plm
I run an lm
(as in the line below) I get the expected result.
model <- lm(y ~ x, data=test.data, na.action=na.exclude)
Simple approaches include taking the average of the column and use that value, or if there is a heavy skew the median might be better. A better approach, you can perform regression or nearest neighbor imputation on the column to predict the missing values. Then continue on with your analysis/model.
When dealing with missing data, data scientists can use two primary methods to solve the error: imputation or the removal of data. The imputation method develops reasonable guesses for missing data. It's most useful when the percentage of missing data is low.
Linear RegressionThe variable with missing data is used as the dependent variable. Cases with complete data for the predictor variables are used to generate the regression equation; the equation is then used to predict missing values for incomplete cases.
Explanation: One of the most widely used imputation methods in such a case is the last observation carried forward (LOCF). This method replaces every missing value with the last observed value from the same subject. Whenever a value is missing, it is replaced with the last observed value [12].
I think this is something that predict.plm
ought to handle for you -- seems like an oversight on the package authors' part -- but you can use ?napredict
to implement it for yourself:
pp <- predict(model, test.data)
na.stuff <- attr(model$model,"na.action")
(yhat <- napredict(na.stuff,pp))
## [1] 1.371429 NA 5.485714 7.542857 9.600000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With