For the following dataframe:
StationID HoursAhead BiasTemp SS0279 0 10 SS0279 1 20 KEOPS 0 0 KEOPS 1 5 BB 0 5 BB 1 5 I'd like to get something like:
StationID BiasTemp SS0279 15 KEOPS 2.5 BB 5 I know I can script something like this to get the desired result:
def transform_DF(old_df,col): list_stations = list(set(old_df['StationID'].values.tolist())) header = list(old_df.columns.values) header.remove(col) header_new = header new_df = pandas.DataFrame(columns = header_new) for i,station in enumerate(list_stations): general_results = old_df[(old_df['StationID'] == station)].describe() new_row = [] for column in header_new: if column in ['StationID']: new_row.append(station) continue new_row.append(general_results[column]['mean']) new_df.loc[i] = new_row return new_df But I wonder if there is something more straightforward in pandas.
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False
In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean() Out[4]: StationID BiasTemp 0 BB 5.0 1 KEOPS 2.5 2 SS0279 15.0 Without as_index=False, it returns a Series instead
In [5]: df.groupby('StationID')['BiasTemp'].mean() Out[5]: StationID BB 5.0 KEOPS 2.5 SS0279 15.0 Name: BiasTemp, dtype: float64 Read more about groupby in this pydata tutorial.
This is what groupby is for:
In [117]: df.groupby('StationID')['BiasTemp'].mean() Out[117]: StationID BB 5.0 KEOPS 2.5 SS0279 15.0 Name: BiasTemp, dtype: float64 Here we groupby the 'StationID' column, we then access the 'BiasTemp' column and call mean on it
There is a section in the docs on this functionality.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With