Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

I have the following Python dataframe:

Type    Actual  Predicted
A       4       3
A       10      18
A       13      11
B       3       10
B       4       2
B       8       33
C       20      17
C       40      33
C       87      80
C       32      30

I have the code to calculate R^2 and RMSE but I don't know how to calculate it by distinct "Type".

For now, my methodology is breaking the larger table into three smaller tables consisting of only A, B, C values and then calculating R^2 and RMSE off each smaller table...then appending them back together.

But the above method is inefficient and I believe there should be an easier way?

Below is the format I want the results to produce when things are grouped:

Type    R^2     RMSE    
A       value   value   
B       value   value   
C       value   value   
like image 560
PineNuts0 Avatar asked Dec 20 '17 21:12

PineNuts0


People also ask

How to calculate RMSE in Python?

How to Calculate RMSE in Python The root mean square error (RMSE) is a metric that tells us how far apart our predicted values are from our observed values in a model, on average. It is calculated as: RMSE = √[ Σ(Pi – Oi)2 / n ] where:

What is groupby in pandas Dataframe?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe.groupby () function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes.

How to form groups based on more than one category in R?

Example #2: Use groupby () function to form groups based on more than one category (i.e. Use more than one column to perform the splitting). groupby () is a very powerful function with a lot of variations. It makes the task of splitting the dataframe over some criteria really easy and efficient.

How to group the data based on the “team” in Python?

Example #1: Use groupby () function to group the data based on the “Team”. import pandas as pd. df = pd.read_csv ("nba.csv") df. Now apply the groupby () function. gk = df.groupby ('Team') gk.first ()


1 Answers

Here is a groupby method:

import numpy as np
import pandas as pd
from sklearn.metrics import r2_score, mean_squared_error

def r2_rmse(g):
    r2 = r2_score(g['Actual'], g['Predicted'])
    rmse = np.sqrt(mean_squared_error(g['Actual'], g['Predicted']))
    return pd.Series(dict(r2 = r2, rmse = rmse))

your_df.groupby('Type').apply(r2_rmse).reset_index()
like image 51
Tom Avatar answered Oct 20 '22 14:10

Tom