Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting the grouped data as per group size in Pandas

I have two columns in my dataset, col1 and col2. I want group the data as per col1 and then sort the data as per the size of each group. That is, I want to display groups in ascending order of their size.

I have written the code for grouping and displaying the data as follows:

grouped_data = df.groupby('col1')
"""code for sorting comes here"""
for name,group in grouped_data:
          print (name)
          print (group)

Before displaying the data, I need to sort it as per group size, which I am not able to do.

like image 310
krackoder Avatar asked Mar 10 '14 03:03

krackoder


People also ask

How do I sort by groupby size?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

How do you sort values in groupby pandas?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

How do you sort data by group?

Select a cell in the column you want to sort. On the Data tab, in the Sort & Filter group, click Sort. In the Sort dialog box, under Column, in the Sort by or Then by box, select the column that you want to sort by a custom list. Under Order, select Custom List.

Does group by sort data pandas?

Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.


2 Answers

For Pandas 0.17+, use sort_values:

df.groupby('col1').size().sort_values(ascending=False)

For pre-0.17, you can use size().order():

df.groupby('col1').size().order(ascending=False)
like image 111
Victor Yan Avatar answered Sep 28 '22 15:09

Victor Yan


You can use python's sorted:

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], index=['a', 'b', 'c'], columns=['A', 'B'])

In [12]: g = df.groupby('A')

In [13]: sorted(g,  # iterates pairs of (key, corresponding subDataFrame)
                key=lambda x: len(x[1]),  # sort by number of rows (len of subDataFrame)
                reverse=True)  # reverse the sort i.e. largest first
Out[13]: 
[(1,    A  B
     a  1  2
     b  1  4),
 (5,    A  B
     c  5  6)]

Note: as an iterator g, iterates over pairs of the key and the corresponding subframe:

In [14]: list(g)  # happens to be the same as the above...
Out[14]:
[(1,    A  B
     a  1  2
     b  1  4,
 (5,    A  B
     c  5  6)]
like image 34
Andy Hayden Avatar answered Sep 28 '22 16:09

Andy Hayden