Given the DataFrame:
import pandas as pd
df = pd.DataFrame([6, 4, 2, 4, 5], index=[2, 6, 3, 4, 5], columns=['A'])
Results in:
A
2 6
6 4
3 2
4 4
5 5
Now, I would like to sort by values of Column A AND the index.
e.g.
df.sort_values(by='A')
Returns
A
3 2
6 4
4 4
5 5
2 6
Whereas I would like
A
3 2
4 4
6 4
5 5
2 6
How can I get a sort on the column first and index second?
You can sort by index and then by column A using kind='mergesort'
.
This works because mergesort is stable.
res = df.sort_index().sort_values('A', kind='mergesort')
Result:
A
3 2
4 4
6 4
5 5
2 6
Using lexsort
from numpy may be other way and little faster as well:
df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index
Result:
A
3 2
4 4
6 4
5 5
2 6
Comparing with timeit
:
%%timeit
df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index
Result:
1000 loops, best of 3: 278 µs per loop
With reset index and set index again:
%%timeit
df.reset_index().sort_values(by=['A','index']).set_index('index')
Result:
100 loops, best of 3: 2.09 ms per loop
The other answers are great. I'll throw in one other option, which is to provide a name for the index first using rename_axis and then reference it in sort_values
. I have not tested the performance but expect the accepted answer to still be faster.
df.rename_axis('idx').sort_values(by=['A', 'idx'])
A
idx
3 2
4 4
6 4
5 5
2 6
You can clear the index name afterward if you want with df.index.name = None
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With