I have a dataset and I want to build a model, preferably with the caret
package. My data is actually a time series but the question is not specific to time series, it's just that I work with CreateTimeSlices
for the data partition.
My data has a certain amount of missing values NA
, and I imputed them separately of the caret
code. I also kept a record of their locations:
# a logical vector same size as the data, which obs were imputed NA
imputed=c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE)
imputed[imputed] <- NA; print(imputed)
#### [1] FALSE FALSE FALSE NA FALSE FALSE
I know there is an option in Caret train
function to either exclude the NA
or impute them with different techniques. That's not what I want. I need to build the model on the already imputed dataset but I want to exclude the imputed points from the calculation of the error indicators (RMSE, MAE, ...).
I don't know how to do this in caret. In my first script I tried to do the whole cross validation manually, and then I had a customized error measure:
actual = c(5, 4, 3, 6, 7, 5)
predicted = c(4, 4, 3.5, 7, 6.8, 4)
Metrics::rmse(actual, predicted) # with all the points
#### [1] 0.7404953
sqrt(mean( (!imputed)*(actual-predicted)^2 , na.rm=T)) # excluding the imputed
#### [1] 0.676757
How can I handle this way of doing in caret
? Or is there another way to avoid coding everything by hand?
I don't know if you are looking for this but here is a simple solution by creating a function:
i=which(imputed==F) ## As you have index for NA values
metric_na=function(fun, actual, predicted, index){
fun(actual[index], predicted[index])
}
metric_na(Metrics::rmse, actual, predicted, index = i)
0.676757
metric_na(Metrics::mae, actual, predicted, index = i)
0.54
Also you can just use the index directly while calculating the desired metrics:
Metrics::rmse(actual[i], predicted[i])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With