Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, group by count and add count to original dataframe?

When trying to count rows with similar 'kind' in data frame:

import pandas as pd

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
         ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
         ('bb', 'bb text 4'),
         ('cccc','cccc text 1'), ('cccc','cccc text 2'),
         ('dd', 'dd text 1'),
         ('e', 'e text 1'),
         ('fff', 'fff text 1'),
        ]

df = pd.DataFrame(items, columns=['kind', 'msg'])
df

    kind    msg
0   aaa     aaa text 1
1   aaa     aaa text 2
2   aaa     aaa text 3
3   bb      bb text 1
4   bb      bb text 2
5   bb      bb text 3
6   bb      bb text 4
7   cccc    cccc text 1
8   cccc    cccc text 2
9   dd      dd text 1
10  e       e text 1
11  fff     fff text 1

This code works:

df = df[['kind']].groupby(['kind'])['kind'] \
                         .count() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(5)

df

Resulting in:

    kind      count
    0   aaa   1
    1   bb    1
    2   cccc  1
    3   dd    1
    4   e     1

Yet, how can one get a data frame with all columns as in original one plus 'count' column? So the result should have columns 'kind', 'msg', 'count' in this order?

Also, how to sort this resulting data frame in descending order of count?

like image 609
dokondr Avatar asked Jul 27 '17 09:07

dokondr


People also ask

How to do groupby count in pandas Dataframe?

Groupby count in pandas dataframe python. Groupby count in pandas python can be accomplished by groupby () function. Groupby count of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby () function and aggregate () function. let’s see how to. Groupby single column in pandas – groupby count.

How to group by multiple columns in pandas python using agg()?

Using aggregate() function: agg() function takes ‘count’ as input which performs groupby count, reset_index() assigns the new index to the grouped by dataframe and makes them a proper dataframe structure ''' Groupby multiple columns in pandas python using agg()''' df1.groupby(['State','Product'])['Sales'].agg('count').reset_index()

How to count the number of rows in a Dataframe?

You can use pandas DataFrame.groupby ().count () to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.

How to group by Count and reset_Index in a Dataframe?

We will groupby count with “Product” and “State” columns along with the reset_index () will give a proper table structure , so the result will be agg () function takes ‘count’ as input which performs groupby count, reset_index () assigns the new index to the grouped by dataframe and makes them a proper dataframe structure


2 Answers

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count')

In [248]: df
Out[248]:
    kind          msg  count
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

sorting:

In [249]: df.sort_values('count', ascending=False)
Out[249]:
    kind          msg  count
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1
like image 111
MaxU - stop WAR against UA Avatar answered Sep 19 '22 17:09

MaxU - stop WAR against UA


Here is the simple code to count the frequencies and add a column to the data frame when grouping by the kind column.

df['count'] = df.groupby('kind')['kind'].transform('count')
like image 44
Shubham Singh Chauhan Avatar answered Sep 19 '22 17:09

Shubham Singh Chauhan