Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Simple example of calculating RMSE from data frame

Tags:

python

pandas

Need a simple example of calculating RMSE with Pandas DataFrame. Providing there is function that returns in cycle true and predicted value:

def fun (data):
   ...
   return trueVal, predVal

for data in set:
   fun(data)

And then some code puts these results in the following data frame where x is a real value and p is a predicted value:

In [20]: d
Out[20]: {'p': [1, 10, 4, 5, 5], 'x': [1, 2, 3, 4, 5]}

In [21]: df = pd.DataFrame(d)

In [22]: df
Out[22]: 
    p  x
0   1  1
1  10  2
2   4  3
3   5  4
4   5  5

Questions:

1) How to put results from fun function in df data frame?

2) How to calculate RMSE using df data frame?

like image 373
zork Avatar asked Dec 26 '16 09:12

zork


People also ask

How do you calculate RMSE?

To compute RMSE, calculate the residual (difference between prediction and truth) for each data point, compute the norm of residual for each data point, compute the mean of residuals and take the square root of that mean.


2 Answers

Question 1
This depends on the format that data is in. And I'd expect you already have your true values, so this function is just a pass through.

Question 2

With pandas
((df.p - df.x) ** 2).mean() ** .5

With numpy
(np.diff(df.values) ** 2).mean() ** .5

like image 125
piRSquared Avatar answered Sep 19 '22 18:09

piRSquared


Question 1

I understand you already have a dataframe df. To add the new values in new rows do the following:

for data in set:

    trueVal, predVal = fun(data)

    auxDf = pd.DataFrame([[predVal, trueVal]], columns = ['p', 'x'])

    df.append(auxDf, ignore_index = True)

Question 2

To calculate RMSE using df, I recommend you to use the scikit learn function.

from sklearn.metrics import mean_squared_error 
realVals = df.x
predictedVals = df.p
mse = mean_squared_error(realVals, predictedVals)
# If you want the root mean squared error
# rmse = mean_squared_error(realVals, predictedVals, squared = False)

It's very important that you don't have null values in the columns, otherwise it won't work

like image 40
Iker Avatar answered Sep 19 '22 18:09

Iker