Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting a multi-index while respecting its index structure

Tags:

python

pandas

How can I sort a multi-index dataframe while respecting the organization of levels?

E.g. given the following df, say we sort it according to C (e.g. in descending order):

                   C         D  E
A    B                           
bar  one   -0.346528  1.528538  1
     three -0.136710 -0.147842  1
flux six    0.795641 -1.610137  1
     three  1.051926 -1.316725  2
foo  five   0.906627  0.717922  0
     one   -0.152901 -0.043107  2
     two    0.542137 -0.373016  2
     two    0.329831  1.067820  1

We should get:

                   C         D  E
A    B                           
bar  three -0.136710 -0.147842  1
     one   -0.346528  1.528538  1
flux three  1.051926 -1.316725  2
     six    0.795641 -1.610137  1
foo  five   0.906627  0.717922  0
     two    0.542137 -0.373016  2
     two    0.329831  1.067820  1
     two   -0.152901 -0.043107  2

Note that what I mean by "respecting its index structure" is sorting the leafs of the dataframe without changing the ordering of higher-level indices. In other words, I want to sort the second level while keeping the ordering of the the first level untouched.

What about doing the same in ascending order?

I read these two threads (yes, with the same title):

  • Multi-Index Sorting in Pandas
  • Multi Index Sorting in Pandas

but they sort the dataframes according to a different criteria (e.g. index names, or a specific column in a group).

like image 998
Amelio Vazquez-Reina Avatar asked Oct 14 '14 00:10

Amelio Vazquez-Reina


1 Answers

.reset_index, then sort based on columns A and C and then set the index back; This will be more efficient than the earlier groupby solution:

>>> df.reset_index().sort(columns=['A', 'C'], ascending=[True, False]).set_index(['A', 'B'])
                C      D  E
A    B                     
bar  three -0.137 -0.148  1
     one   -0.347  1.529  1
flux three  1.052 -1.317  2
     six    0.796 -1.610  1
foo  five   0.907  0.718  0
     two    0.542 -0.373  2
     two    0.330  1.068  1
     one   -0.153 -0.043  2

earlier solution: .groupby(...).apply is relatively slow, and may not scale very well:

>>> df['arg-sort'] = df.groupby(level='A')['C'].apply(pd.Series.argsort)
>>> f = lambda obj: obj.iloc[obj.loc[::-1, 'arg-sort'], :]
>>> df.groupby(level='A', group_keys=False).apply(f)
                C      D  E  arg-sort
A    B                               
bar  three -0.137 -0.148  1         1
     one   -0.347  1.529  1         0
flux three  1.052 -1.317  2         1
     six    0.796 -1.610  1         0
foo  five   0.907  0.718  0         1
     two    0.542 -0.373  2         2
     two    0.330  1.068  1         0
     one   -0.153 -0.043  2         3
like image 136
behzad.nouri Avatar answered Oct 13 '22 09:10

behzad.nouri