I have a dataframe say like this
>>> df = pd.DataFrame({'user_id':['a','a','s','s','s'],                     'session':[4,5,4,5,5],                     'revenue':[-1,0,1,2,1]})  >>> df    revenue  session user_id 0       -1        4       a 1        0        5       a 2        1        4       s 3        2        5       s 4        1        5       s   And each value of session and revenue represents a kind of type, and I want to count the number of each kind say the number of revenue=-1 and session=4 of user_id=a is 1.
And I found simple call count() function after groupby() can't output the result I want.
>>> df.groupby('user_id').count()          revenue  session user_id a              2        2 s              3        3   How can I do that?
You seem to want to group by several columns at once:
df.groupby(['revenue','session','user_id'])['user_id'].count()   should give you what you want
df.value_counts is available!From pandas 1.1, this will be my recommended method for counting the number of rows in groups (i.e., the group size). To count the number of non-nan rows in a group for a specific column, check out the accepted answer.
Old
df.groupby(['A', 'B']).size()   # df.groupby(['A', 'B'])['C'].count()  New [✓]
df.value_counts(subset=['A', 'B'])   Note that size and count are not identical, the former counts all rows per group, the latter counts non-null rows only. See this other answer of mine for more.
pd.__version__ # '1.1.0.dev0+2004.g8d10bfb6f'  df = pd.DataFrame({'num_legs': [2, 4, 4, 6],                    'num_wings': [2, 0, 0, 0]},                   index=['falcon', 'dog', 'cat', 'ant']) df         num_legs  num_wings falcon         2          2 dog            4          0 cat            4          0 ant            6          0   df.value_counts(subset=['num_legs', 'num_wings'], sort=False)  num_legs  num_wings 2         2            1 4         0            2 6         0            1 dtype: int64  Compare this output with
df.groupby(['num_legs', 'num_wings'])['num_legs'].size()  num_legs  num_wings 2         2            1 4         0            2 6         0            1 Name: num_legs, dtype: int64  It's also faster if you don't sort the result:
%timeit df.groupby(['num_legs', 'num_wings'])['num_legs'].count() %timeit df.value_counts(subset=['num_legs', 'num_wings'], sort=False)  640 µs ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 568 µs ± 6.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With