I have a huge dataset with 679 rows and 16 columns with 30 % of missing values. So I decided to impute this missing values with the function impute.knn from the package impute and I got a dataset with 679 rows and 16 columns but without the missing values.
But now I want to check the accuracy using the RMSE and I tried 2 options:
hydroGOF
and apply the rmse
function sqrt(mean (obs-sim)^2), na.rm=TRUE)
In two situations I have the error: errors in sim .obs: non numeric argument to binary operator.
This is happening because the original data set contains an NA
value (some values are missing).
How can I calculate the RMSE if I remove the missing values? Then obs
and sim
will have different sizes.
How about simply...
sqrt( sum( (df$model - df$measure)^2 , na.rm = TRUE ) / nrow(df) )
Obviously assuming your dataframe is called df
and you have to decide on your N ( i.e. nrow(df)
includes the two rows with missing data; do you want to exclude these from N observations? I'd guess yes, so instead of nrow(df)
you probably want to use sum( !is.na(df$measure) )
) or, following @Joshua just
sqrt( mean( (df$model-df$measure)^2 , na.rm = TRUE ) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With