Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform RMSE with missing values?

Tags:

r

hydrogof

I have a huge dataset with 679 rows and 16 columns with 30 % of missing values. So I decided to impute this missing values with the function impute.knn from the package impute and I got a dataset with 679 rows and 16 columns but without the missing values.

But now I want to check the accuracy using the RMSE and I tried 2 options:

  1. load the package hydroGOF and apply the rmse function
  2. sqrt(mean (obs-sim)^2), na.rm=TRUE)

In two situations I have the error: errors in sim .obs: non numeric argument to binary operator.

This is happening because the original data set contains an NA value (some values are missing).

How can I calculate the RMSE if I remove the missing values? Then obs and sim will have different sizes.

like image 903
Telma_7919 Avatar asked Jul 17 '13 14:07

Telma_7919


1 Answers

How about simply...

sqrt( sum( (df$model - df$measure)^2 , na.rm = TRUE ) / nrow(df) )

Obviously assuming your dataframe is called df and you have to decide on your N ( i.e. nrow(df) includes the two rows with missing data; do you want to exclude these from N observations? I'd guess yes, so instead of nrow(df) you probably want to use sum( !is.na(df$measure) ) ) or, following @Joshua just

sqrt( mean( (df$model-df$measure)^2 , na.rm = TRUE ) )
like image 109
Simon O'Hanlon Avatar answered Sep 17 '22 12:09

Simon O'Hanlon