I have two columns in my dataset, col1 and col2. I want group the data as per col1 and then sort the data as per the size of each group. That is, I want to display groups in ascending order of their size. I have written the code for grouping and displaying the data as follows: <pre class="prettyprint"><code>grouped_data = df.groupby('col1') """code for sorting comes here""" for name,group in grouped_data: print (name) print (group) </code></pre> Before displaying the data, I need to sort it as per group size, which I am not able to do.

For Pandas 0.17+, use <code>sort_values</code>: <pre class="prettyprint"><code>df.groupby('col1').size().sort_values(ascending=False) </code></pre> For pre-0.17, you can use <code>size().order()</code>: <pre class="prettyprint"><code>df.groupby('col1').size().order(ascending=False) </code></pre>

Sorting the grouped data as per group size in Pandas

Tags:

python

python-3.x

pandas

pandas-groupby

I have two columns in my dataset, col1 and col2. I want group the data as per col1 and then sort the data as per the size of each group. That is, I want to display groups in ascending order of their size.

I have written the code for grouping and displaying the data as follows:

grouped_data = df.groupby('col1')
"""code for sorting comes here"""
for name,group in grouped_data:
          print (name)
          print (group)

Before displaying the data, I need to sort it as per group size, which I am not able to do.

310

asked Mar 10 '14 03:03

krackoder

2 Answers

For Pandas 0.17+, use sort_values:

df.groupby('col1').size().sort_values(ascending=False)

For pre-0.17, you can use size().order():

df.groupby('col1').size().order(ascending=False)

111

answered Sep 28 '22 15:09

Victor Yan

You can use python's sorted:

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], index=['a', 'b', 'c'], columns=['A', 'B'])

In [12]: g = df.groupby('A')

In [13]: sorted(g,  # iterates pairs of (key, corresponding subDataFrame)
                key=lambda x: len(x[1]),  # sort by number of rows (len of subDataFrame)
                reverse=True)  # reverse the sort i.e. largest first
Out[13]: 
[(1,    A  B
     a  1  2
     b  1  4),
 (5,    A  B
     c  5  6)]

Note: as an iterator g, iterates over pairs of the key and the corresponding subframe:

In [14]: list(g)  # happens to be the same as the above...
Out[14]:
[(1,    A  B
     a  1  2
     b  1  4,
 (5,    A  B
     c  5  6)]

answered Sep 28 '22 16:09

Andy Hayden

Related questions
                            
                                Read a large zipped text file line by line in python
                            
                                How to count down in for loop? [duplicate]
                            
                                How to SSH and run commands in EC2 using boto3?
                            
                                AttributeError: module 'attr' has no attribute 's'
                            
                                Distribute an integer amount by a set of slots as evenly as possible
                            
                                Django templates syntax highlighting in Eclipse
                            
                                Python Regex to find a string in double quotes within a string
                            
                                Multiple conditions using 'or' in numpy array
                            
                                Flask Python, trying to return list or dict to Ajax call
                            
                                Installation of pygame with Anaconda
                            
                                How do you declare a global constant in Python?
                            
                                How to crop the internal area of a contour?
                            
                                can't install scipy - freezes on "Running setup.py install for scipy"
                            
                                Understanding argmax
                            
                                Pandas: assign an index to each group identified by groupby
                            
                                Filtering pandas dataframe with multiple Boolean columns
                            
                                Webdriver Exception:Process unexpectedly closed with status: 1
                            
                                Using Python 3.1 with TextMate
                            
                                No module named urls
                            
                                How do I open an image from the internet in PIL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With