Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort Pandas DataFrame both by MultiIndex and by value?

Sample data:

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                    [1,5,10],[2,8,80],
                    [2,5,65],[2,8,10]
                   ], columns=['src','dst','n']); mdf

    src dst n
0   1   2   50
1   1   2   20
2   1   5   10
3   2   8   80
4   2   5   65
5   2   8   10

groupby() gives a two-level multi-index:

test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test

        sum count
src dst 
1   2   70  2
    5   10  1
2   5   65  1
    8   90  2

Question: how to sort this DataFrame by src ascending and then by sum descending?

I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.

Expected result, under each "src" sorting is determined by the "sum":

        sum count
src dst 
1   2   70  2
    5   10  1
2   8   90  2
    5   65  1
like image 348
Serge Avatar asked Mar 13 '18 19:03

Serge


People also ask

How do I sort by two values in pandas?

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.

How do I sort a MultiIndex DataFrame in pandas?

However, to sort MultiIndex at a specific level, use the multiIndex. sortlevel() method in Pandas. Set the level as an argument. To sort in descending order, use the ascending parameter and set to False.

How do you sort a DataFrame based on values?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.

How many ways can you sort pandas DataFrame?

All of the sorting methods available in Pandas fall under the following three categories: Sorting by index labels; Sorting by column values; Sorting by a combination of index labels and column values.


2 Answers

In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values:

test.sort_values(['src','sum'], ascending=[1,0])

Result:
         sum  count
src dst            
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1
like image 56
Pol Avatar answered Nov 11 '22 20:11

Pol


IIUC:

In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
         sum  count
src dst
1   2     80      2
    5     10      1
2   8     80      1

UPDATE: very similar to @anonyXmous's solution:

In [47]: (test.reset_index()
              .sort_values(['src','sum'], ascending=[1,0])
              .set_index(['src','dst']))
Out[47]:
         sum  count
src dst
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1
like image 7
MaxU - stop WAR against UA Avatar answered Nov 11 '22 21:11

MaxU - stop WAR against UA