Pandas DataFrame Groupby two columns and get counts

I have a pandas dataframe in the following format:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T df.columns = ['col1','col2','col3','col4','col5']

df:

   col1 col2 col3     col4 col5 0   1.1    A  1.1    x/y/z    1 1   1.1    A  1.7      x/y    3 2   1.1    A  2.5  x/y/z/n    3 3   2.6    B  2.6      x/u    2 4   2.5    B  3.3        x    4 5   3.4    B  3.8    x/u/v    2 6   2.6    B    4    x/y/z    5 7   2.6    A  4.2        x    3 8   3.4    B  4.3  x/u/v/b    6 9   3.4    C  4.5        -    3 10  2.6    B  4.6      x/y    5 11  1.1    D  4.7    x/y/z    1 12  1.1    D  4.7        x    1 13  3.3    D  4.8  x/u/v/w    1

Now I want to group this by two columns like following:

df.groupby(['col5','col2']).reset_index()

OutPut:

             index col1 col2 col3     col4 col5 col5 col2                                       1    A    0      0  1.1    A  1.1    x/y/z    1      D    0     11  1.1    D  4.7    x/y/z    1           1     12  1.1    D  4.7        x    1           2     13  3.3    D  4.8  x/u/v/w    1 2    B    0      3  2.6    B  2.6      x/u    2           1      5  3.4    B  3.8    x/u/v    2 3    A    0      1  1.1    A  1.7      x/y    3           1      2  1.1    A  2.5  x/y/z/n    3           2      7  2.6    A  4.2        x    3      C    0      9  3.4    C  4.5        -    3 4    B    0      4  2.5    B  3.3        x    4 5    B    0      6  2.6    B    4    x/y/z    5           1     10  2.6    B  4.6      x/y    5 6    B    0      8  3.4    B  4.3  x/u/v/b    6

I want to get the count by each row like following. Expected Output:

col5 col2 count 1    A      1      D      3 2    B      2 etc...

How to get my expected output? And I want to find largest count for each 'col2' value?

How do you get Groupby in pandas and count?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

Can I group by 2 columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

You are looking for size:

In [11]: df.groupby(['col5', 'col2']).size() Out[11]: col5  col2 1     A       1       D       3 2     B       2 3     A       3       C       1 4     B       1 5     B       2 6     B       1 dtype: int64

To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:

In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max() Out[12]: col2 A       3 B       2 C       1 D       3 dtype: int64

Pandas DataFrame Groupby two columns and get counts

Tags:

python

pandas

dataframe

Nilani Algiriyage

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Pandas DataFrame Groupby two columns and get counts

Tags:

python

pandas

dataframe

Nilani Algiriyage

People also ask

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us