I have a dataframe that I need to combine two different groupbys with one of them filtered.
ID EVENT SUCCESS
1 PUT Y
2 POST Y
2 PUT N
1 DELETE Y
This table below is how I would like the data to look like. Firstly grouping the 'EVENT' counts, the second is to count the amount of Successes ('Y') per ID
ID PUT POST DELETE SUCCESS
1 1 0 1 2
2 1 1 0 1
I've tried a few techniques and the closet I've found is two separate methods which yield the following
group_df = df.groupby(['ID', 'EVENT'])
count_group_df = group_df.size().unstack()
Which yields the following for the 'EVENT' counts
ID PUT POST DELETE
1 1 0 1
2 1 1 0
For the Successes with filters, i dont know whether I can join this to the first set on 'ID'
df_success = df.loc[df['SUCCESS'] == 'Y', ['ID', 'SUCCESS']]
count_group_df_2 = df_success.groupby(['ID', 'SUCCESS'])
ID SUCCESS
1 2
2 1
I need to combine these somehow?
Additionally I'd also like to merge the counts two of the 'EVENT''s for example PUT's and POST's into one column.
Use concat
for merge them together:
df1 = df.groupby(['ID', 'EVENT']).size().unstack(fill_value=0)
df_success = (df['SUCCESS'] == 'Y').groupby(df['ID']).sum().astype(int)
df = pd.concat([df1, df_success],axis=1)
print (df)
DELETE POST PUT SUCCESS
ID
1 1 0 1 2
2 0 1 1 1
Another solution with value_counts
:
df1 = df.groupby(['ID', 'EVENT']).size().unstack(fill_value=0)
df_success = df.loc[df['SUCCESS'] == 'Y', 'ID'].value_counts().rename('SUCCESS')
df = pd.concat([df1, df_success],axis=1)
print (df)
DELETE POST PUT SUCCESS
ID
1 1 0 1 2
2 0 1 1 1
Last is possible convert index to column and remove columns name ID
by reset_index
+ rename_axis
:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
ID DELETE POST PUT SUCCESS
0 1 1 0 1 2
1 2 0 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With