Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: how to sort dataframe by column AND by index

Given the DataFrame:

import pandas as pd
df = pd.DataFrame([6, 4, 2, 4, 5], index=[2, 6, 3, 4, 5], columns=['A'])

Results in:

   A
2  6
6  4
3  2
4  4
5  5

Now, I would like to sort by values of Column A AND the index.

e.g.

df.sort_values(by='A')

Returns

   A
3  2
6  4
4  4
5  5
2  6

Whereas I would like

   A
3  2
4  4
6  4
5  5
2  6

How can I get a sort on the column first and index second?

like image 403
David M Avatar asked Mar 19 '18 01:03

David M


3 Answers

You can sort by index and then by column A using kind='mergesort'.

This works because mergesort is stable.

res = df.sort_index().sort_values('A', kind='mergesort')

Result:

   A
3  2
4  4
6  4
5  5
2  6
like image 69
jpp Avatar answered Nov 01 '22 10:11

jpp


Using lexsort from numpy may be other way and little faster as well:

df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index

Result:

   A
3  2
4  4
6  4
5  5
2  6

Comparing with timeit:

%%timeit
df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index

Result:

1000 loops, best of 3: 278 µs per loop

With reset index and set index again:

 %%timeit
df.reset_index().sort_values(by=['A','index']).set_index('index')

Result:

100 loops, best of 3: 2.09 ms per loop
like image 8
student Avatar answered Nov 01 '22 08:11

student


The other answers are great. I'll throw in one other option, which is to provide a name for the index first using rename_axis and then reference it in sort_values. I have not tested the performance but expect the accepted answer to still be faster.

df.rename_axis('idx').sort_values(by=['A', 'idx'])

     A
idx   
3    2
4    4
6    4
5    5
2    6

You can clear the index name afterward if you want with df.index.name = None.

like image 2
totalhack Avatar answered Nov 01 '22 10:11

totalhack