I have this data:
ID TIME 1 2 1 4 1 2 2 3
I want to group the data by ID
and calculate the mean time and the size of each group.
ID MEAN_TIME COUNT 1 2.67 3 2 3.00 1
If I run this code, then I get an error "ValueError: cannot insert ID, already exists":
result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index()
Use parameter drop=True
which not create new column with index
but remove it:
result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index(drop=True) print (result) ID TIME 0 3 2.666667 1 1 3.000000
But if need new column from index need rename
old column names first:
result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}) .rename(columns={'ID':'COUNT','TIME':'MEAN_TIME'}) .reset_index() print (result) ID COUNT MEAN_TIME 0 1 3 2.666667 1 2 1 3.000000
Solution if need aggreagate by multiple columns:
result = df.groupby(['ID']).agg({'TIME':{'MEAN_TIME': 'mean'}, 'ID': {'COUNT': 'count'}}) result.columns = result.columns.droplevel(0) print (result.reset_index()) ID COUNT MEAN_TIME 0 1 3 2.666667 1 2 1 3.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With