Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate mean values grouped on another column in Pandas

For the following dataframe:

StationID  HoursAhead    BiasTemp   SS0279           0          10 SS0279           1          20 KEOPS            0          0 KEOPS            1          5 BB               0          5 BB               1          5 

I'd like to get something like:

StationID  BiasTemp   SS0279     15 KEOPS      2.5 BB         5 

I know I can script something like this to get the desired result:

def transform_DF(old_df,col):     list_stations = list(set(old_df['StationID'].values.tolist()))     header = list(old_df.columns.values)     header.remove(col)     header_new = header     new_df = pandas.DataFrame(columns = header_new)     for i,station in enumerate(list_stations):         general_results = old_df[(old_df['StationID'] == station)].describe()         new_row = []         for column in header_new:             if column in ['StationID']:                  new_row.append(station)                 continue             new_row.append(general_results[column]['mean'])         new_df.loc[i] = new_row     return new_df 

But I wonder if there is something more straightforward in pandas.

like image 806
Rafa Avatar asked May 27 '15 12:05

Rafa


People also ask

How do you find mean of a column grouped by another column in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.


Video Answer


2 Answers

You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False

In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean() Out[4]:   StationID  BiasTemp 0        BB       5.0 1     KEOPS       2.5 2    SS0279      15.0 

Without as_index=False, it returns a Series instead

In [5]: df.groupby('StationID')['BiasTemp'].mean() Out[5]: StationID BB            5.0 KEOPS         2.5 SS0279       15.0 Name: BiasTemp, dtype: float64 

Read more about groupby in this pydata tutorial.

like image 160
Zero Avatar answered Oct 06 '22 00:10

Zero


This is what groupby is for:

In [117]: df.groupby('StationID')['BiasTemp'].mean()  Out[117]: StationID BB         5.0 KEOPS      2.5 SS0279    15.0 Name: BiasTemp, dtype: float64 

Here we groupby the 'StationID' column, we then access the 'BiasTemp' column and call mean on it

There is a section in the docs on this functionality.

like image 35
EdChum Avatar answered Oct 05 '22 22:10

EdChum