Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby sort within groups

People also ask

How do you sort values in Groupby pandas?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

Does Groupby sort data pandas?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

Does pandas Groupby keep order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

What does Group_by do in pandas?

Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.


You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)

Out[35]: 
   count     job source
4      7   sales      E
2      6   sales      C
1      4   sales      B
5      5  market      A
8      4  market      D
6      3  market      B

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)
Out[65]:
job     source
market  A         5
        D         4
        B         3
sales   E         7
        C         6
        B         4
dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)

Here's other example of taking top 3 on sorted order, and sorting within the groups:

In [43]: import pandas as pd                                                                                                                                                       

In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})

In [45]: df                                                                                                                                                                        
Out[45]: 
   count_1  count_2  name
0        5      100   Foo
1       10      150   Foo
2       12      100  Baar
3       15       25   Foo
4       20      250  Baar
5       25      300   Foo
6       30      400  Baar
7       35      500  Baar


### Top 3 on sorted order:
In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                               
Out[46]: 
name   
Baar  7    35
      6    30
      4    20
Foo   5    25
      3    15
      1    10
dtype: int64


### Sorting within groups based on column "count_1":
In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)
Out[48]: 
   count_1  count_2  name
0       35      500  Baar
1       30      400  Baar
2       20      250  Baar
3       12      100  Baar
4       25      300   Foo
5       15       25   Foo
6       10      150   Foo
7        5      100   Foo

Try this Instead, which is a simple way to do groupby and sorting in descending order:

df.groupby(['companyName'])['overallRating'].sum().sort_values(ascending=False).head(20)

If you don't need to sum a column, then use @tvashtar's answer. If you do need to sum, then you can use @joris' answer or this one which is very similar to it.

df.groupby(['job']).apply(lambda x: (x.groupby('source')
                                      .sum()
                                      .sort_values('count', ascending=False))
                                     .head(3))