Take the following dataframe:
import pandas as pd
df = pd.DataFrame({'group_name': ['A','A','A','B','B','B'],
'timestamp': [4,6,1000,5,8,100],
'condition': [True,True,False,True,False,True]})
I want to add two columns:
condition
column within each groupI know I can do it with a custom apply, but I'm wondering if anyone has any fun ideas? (Also this is slow when there are many groups.) Here's one solution:
def range_within_group(input_df):
df_to_return = input_df.copy()
df_to_return = df_to_return.sort('timestamp')
df_to_return['order_within_group'] = range(len(df_to_return))
df_to_return['rolling_sum_of_condition'] = df_to_return.condition.cumsum()
return df_to_return
df.groupby('group_name').apply(range_within_group).reset_index(drop=True)
Use double brackets to reorder columns in a DataFrame Use the syntax DataFrame[["column1", "column2", "column3"]] with the column names in the desired order to reorder the columns.
Groupby preserves the order of rows within each group. Thus, it is clear the "Groupby" does preserve the order of rows within each group.
To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.
GroupBy.cumcount
does:
Number each item in each group from 0 to the length of that group - 1.
so simply:
>>> gr = df.sort('timestamp').groupby('group_name')
>>> df['order_within_group'] = gr.cumcount()
>>> df['rolling_sum_of_condition'] = gr['condition'].cumsum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With