Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Sort a dataframe based on multiple columns

I know that this question has been asked several times. But none of the answers match my case.

I've a pandas dataframe with columns,department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie between 2 employee_counts then they should be sorted alphabetically based on department.

   Department Employee_Count
0    abc          10
1    adc          10
2    bca          11
3    cde          9
4    xyz          15

required output:

   Department Employee_Count
0    xyz          15
1    bca          11
2    abc          10
3    adc          10
4    cde          9

This is what I've tried.

df = df.sort_values(['Department','Employee_Count'],ascending=[True,False])

But this just sorts the departments alphabetically.

I've also tried to sort by Department first and then by Employee_Count. Like this:

df = df.sort_values(['Department'],ascending=[True])
df = df.sort_values(['Employee_Count'],ascending=[False])

This doesn't give me correct output either:

   Department Employee_Count
4    xyz          15
2    bca          11
1    adc          10
0    abc          10
3    cde          9

It gives 'adc' first and then 'abc'. Kindly help me.

like image 398
Impromptu_Coder Avatar asked Nov 04 '19 08:11

Impromptu_Coder


People also ask

How do you sort data frames based on columns?

Sorting Your DataFrame on a Single Column. To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.

Can you group by multiple columns in Pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

How do you sort a DataFrame in descending order?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in descending order, use sort_values(). The size() method is used to get the dataframe size.

How many ways can you sort Pandas DataFrame?

All of the sorting methods available in Pandas fall under the following three categories: Sorting by index labels; Sorting by column values; Sorting by a combination of index labels and column values.


1 Answers

You can swap columns in list and also values in ascending parameter:

Explanation:

Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.

df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
  Department  Employee_Count
4        xyz              15
2        bca              11
0        abc              10 <-
1        adc              10 <-
3        cde               9

Or for test if use second False then duplicated rows are sorting descending:

df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
  Department  Employee_Count
4        xyz              15
2        bca              11
1        adc              10 <-
0        abc              10 <-
3        cde               9
like image 147
jezrael Avatar answered Oct 21 '22 10:10

jezrael