For the following dataframe:
StationID HoursAhead BiasTemp SS0279 0 10 SS0279 1 20 KEOPS 0 0 KEOPS 1 5 BB 0 5 BB 1 5
I'd like to get something like:
StationID BiasTemp SS0279 15 KEOPS 2.5 BB 5
I know I can script something like this to get the desired result:
def transform_DF(old_df,col): list_stations = list(set(old_df['StationID'].values.tolist())) header = list(old_df.columns.values) header.remove(col) header_new = header new_df = pandas.DataFrame(columns = header_new) for i,station in enumerate(list_stations): general_results = old_df[(old_df['StationID'] == station)].describe() new_row = [] for column in header_new: if column in ['StationID']: new_row.append(station) continue new_row.append(general_results[column]['mean']) new_df.loc[i] = new_row return new_df
But I wonder if there is something more straightforward in pandas.
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
You could groupby
on StationID
and then take mean()
on BiasTemp
. To output Dataframe
, use as_index=False
In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean() Out[4]: StationID BiasTemp 0 BB 5.0 1 KEOPS 2.5 2 SS0279 15.0
Without as_index=False
, it returns a Series
instead
In [5]: df.groupby('StationID')['BiasTemp'].mean() Out[5]: StationID BB 5.0 KEOPS 2.5 SS0279 15.0 Name: BiasTemp, dtype: float64
Read more about groupby
in this pydata tutorial.
This is what groupby
is for:
In [117]: df.groupby('StationID')['BiasTemp'].mean() Out[117]: StationID BB 5.0 KEOPS 2.5 SS0279 15.0 Name: BiasTemp, dtype: float64
Here we groupby the 'StationID' column, we then access the 'BiasTemp' column and call mean
on it
There is a section in the docs on this functionality.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With