Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame Groupby two columns and get counts

I have a pandas dataframe in the following format:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T df.columns = ['col1','col2','col3','col4','col5'] 

df:

   col1 col2 col3     col4 col5 0   1.1    A  1.1    x/y/z    1 1   1.1    A  1.7      x/y    3 2   1.1    A  2.5  x/y/z/n    3 3   2.6    B  2.6      x/u    2 4   2.5    B  3.3        x    4 5   3.4    B  3.8    x/u/v    2 6   2.6    B    4    x/y/z    5 7   2.6    A  4.2        x    3 8   3.4    B  4.3  x/u/v/b    6 9   3.4    C  4.5        -    3 10  2.6    B  4.6      x/y    5 11  1.1    D  4.7    x/y/z    1 12  1.1    D  4.7        x    1 13  3.3    D  4.8  x/u/v/w    1 

Now I want to group this by two columns like following:

df.groupby(['col5','col2']).reset_index() 

OutPut:

             index col1 col2 col3     col4 col5 col5 col2                                       1    A    0      0  1.1    A  1.1    x/y/z    1      D    0     11  1.1    D  4.7    x/y/z    1           1     12  1.1    D  4.7        x    1           2     13  3.3    D  4.8  x/u/v/w    1 2    B    0      3  2.6    B  2.6      x/u    2           1      5  3.4    B  3.8    x/u/v    2 3    A    0      1  1.1    A  1.7      x/y    3           1      2  1.1    A  2.5  x/y/z/n    3           2      7  2.6    A  4.2        x    3      C    0      9  3.4    C  4.5        -    3 4    B    0      4  2.5    B  3.3        x    4 5    B    0      6  2.6    B    4    x/y/z    5           1     10  2.6    B  4.6      x/y    5 6    B    0      8  3.4    B  4.3  x/u/v/b    6 

I want to get the count by each row like following. Expected Output:

col5 col2 count 1    A      1      D      3 2    B      2 etc... 

How to get my expected output? And I want to find largest count for each 'col2' value?

like image 422
Nilani Algiriyage Avatar asked Jul 16 '13 14:07

Nilani Algiriyage


People also ask

How do you get Groupby in pandas and count?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

Can I group by 2 columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.


1 Answers

You are looking for size:

In [11]: df.groupby(['col5', 'col2']).size() Out[11]: col5  col2 1     A       1       D       3 2     B       2 3     A       3       C       1 4     B       1 5     B       2 6     B       1 dtype: int64 

To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:

In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max() Out[12]: col2 A       3 B       2 C       1 D       3 dtype: int64 
like image 50
Andy Hayden Avatar answered Oct 17 '22 03:10

Andy Hayden