Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas combine two group by's, filter and merge the groups(counts)

I have a dataframe that I need to combine two different groupbys with one of them filtered.

 ID     EVENT      SUCCESS
 1       PUT          Y
 2       POST         Y
 2       PUT          N
 1       DELETE       Y 

This table below is how I would like the data to look like. Firstly grouping the 'EVENT' counts, the second is to count the amount of Successes ('Y') per ID

ID  PUT   POST  DELETE SUCCESS
 1   1     0       1      2
 2   1     1       0      1

I've tried a few techniques and the closet I've found is two separate methods which yield the following

group_df = df.groupby(['ID', 'EVENT']) count_group_df = group_df.size().unstack()

Which yields the following for the 'EVENT' counts

ID  PUT   POST  DELETE
 1   1     0       1      
 2   1     1       0      

For the Successes with filters, i dont know whether I can join this to the first set on 'ID'

 df_success = df.loc[df['SUCCESS'] == 'Y', ['ID', 'SUCCESS']]
 count_group_df_2 = df_success.groupby(['ID', 'SUCCESS'])


ID  SUCCESS
1      2
2      1

I need to combine these somehow?

Additionally I'd also like to merge the counts two of the 'EVENT''s for example PUT's and POST's into one column.

like image 222
Sheepy Avatar asked Jun 02 '17 09:06

Sheepy


1 Answers

Use concat for merge them together:

df1 = df.groupby(['ID', 'EVENT']).size().unstack(fill_value=0)
df_success = (df['SUCCESS'] == 'Y').groupby(df['ID']).sum().astype(int)
df = pd.concat([df1, df_success],axis=1)
print (df)
    DELETE  POST  PUT  SUCCESS
ID                            
1        1     0    1        2
2        0     1    1        1

Another solution with value_counts:

df1 = df.groupby(['ID', 'EVENT']).size().unstack(fill_value=0)
df_success = df.loc[df['SUCCESS'] == 'Y', 'ID'].value_counts().rename('SUCCESS')
df = pd.concat([df1, df_success],axis=1)
print (df)
    DELETE  POST  PUT  SUCCESS
ID                            
1        1     0    1        2
2        0     1    1        1

Last is possible convert index to column and remove columns name ID by reset_index + rename_axis:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
   ID  DELETE  POST  PUT  SUCCESS
0   1       1     0    1        2
1   2       0     1    1        1
like image 95
jezrael Avatar answered Oct 04 '22 20:10

jezrael