The default behavior of pandas groupby is to turn the group by columns into index and remove them from the list of columns of the dataframe. For instance, say I have a dataFrame with these columns
col1|col2|col3|col4
if I apply a groupby say with columns col2
and col3
this way
df.groupby(['col2','col3']).sum()
The dataframe df
no longer has the ['col2','col3']
in the list of columns. They are automatically turned into the indices of the resulting dataframe.
My question is how can I perform groupby on a column and yet keep that column in the dataframe?
Python's groupby() function is versatile. It is used to split the data into groups based on some criteria like mean, median, value_counts, etc. In order to reset the index after groupby() we will use the reset_index() function.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
But fortunately, GroupBy object supports column indexing just like a DataFrame!
groupby() To Group Rows into List. By using DataFrame. gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).
df.groupby(['col2','col3'], as_index=False).sum()
Another way to do this would be:
df.groupby(['col2', 'col3']).sum().reset_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With