pandas groupby sort within groups

People also ask

How do you sort values in Groupby pandas?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

Does Groupby sort data pandas?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

Does pandas Groupby keep order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

What does Group_by do in pandas?

Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.

You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)

Out[35]: 
   count     job source
4      7   sales      E
2      6   sales      C
1      4   sales      B
5      5  market      A
8      4  market      D
6      3  market      B

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)
Out[65]:
job     source
market  A         5
        D         4
        B         3
sales   E         7
        C         6
        B         4
dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)

Here's other example of taking top 3 on sorted order, and sorting within the groups:

In [43]: import pandas as pd                                                                                                                                                       

In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})

In [45]: df                                                                                                                                                                        
Out[45]: 
   count_1  count_2  name
0        5      100   Foo
1       10      150   Foo
2       12      100  Baar
3       15       25   Foo
4       20      250  Baar
5       25      300   Foo
6       30      400  Baar
7       35      500  Baar


### Top 3 on sorted order:
In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                               
Out[46]: 
name   
Baar  7    35
      6    30
      4    20
Foo   5    25
      3    15
      1    10
dtype: int64


### Sorting within groups based on column "count_1":
In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)
Out[48]: 
   count_1  count_2  name
0       35      500  Baar
1       30      400  Baar
2       20      250  Baar
3       12      100  Baar
4       25      300   Foo
5       15       25   Foo
6       10      150   Foo
7        5      100   Foo

Try this Instead, which is a simple way to do groupby and sorting in descending order:

df.groupby(['companyName'])['overallRating'].sum().sort_values(ascending=False).head(20)

If you don't need to sum a column, then use @tvashtar's answer. If you do need to sum, then you can use @joris' answer or this one which is very similar to it.

df.groupby(['job']).apply(lambda x: (x.groupby('source')
                                      .sum()
                                      .sort_values('count', ascending=False))
                                     .head(3))

Related questions
                            
                                How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?
                            
                                What's the function like sum() but for multiplication? product()?
                            
                                error: command 'gcc' failed with exit status 1 while installing eventlet
                            
                                Load data from txt with pandas
                            
                                How to loop backwards in python? [duplicate]
                            
                                Compute list difference
                            
                                TransactionManagementError "You can't execute queries until the end of the 'atomic' block" while using signals, but only during Unit Testing
                            
                                Using headers with the Python requests library's get method
                            
                                Initialising an array of fixed size in Python [duplicate]
                            
                                How to switch position of two items in a Python list?
                            
                                How do I create a slug in Django?
                            
                                How to increment datetime by custom months in python without using library [duplicate]
                            
                                Cannot install Lxml on Mac OS X 10.9
                            
                                Pandas create empty DataFrame with only column names
                            
                                How to create a custom string representation for a class object?
                            
                                TypeError: not all arguments converted during string formatting python
                            
                                Can I serve multiple clients using just Flask app.run() as standalone?
                            
                                How can I verify if one list is a subset of another?
                            
                                How to create new folder? [duplicate]
                            
                                Split a python list into other "sublists" i.e smaller lists [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas groupby sort within groups

Tags:

python

sorting

pandas

group-by

People also ask

Recent Activity

Donate For Us