I have a pandas DataFrame that I need to sort in a particular order in one column, and just ascending in another. Both columns have repeated values. It looks more or less like this:
import pandas as pd
df = pd.DataFrame()
df[0] = pd.Series( [ 'a', 'aa', 'c' ] * 2 )
df[1] = pd.Series( [ 1, 2 ] * 3 )
df[2] = pd.Series( range(6) )
print( df )
0 1 2
0 a 1 0
1 aa 2 1
2 c 1 2
3 a 2 3
4 aa 1 4
5 c 2 5
Now, say that I need to order by columns 0 and 1, but not alphabetically: Column 0 should first follow an order:
order = [ 'a', 'c', 'aa' ]
How do I do that?
I would like to have it sorted like this:
print( sorted_df )
0 1 2
0 a 1 0
1 a 2 3
2 c 1 2
3 c 2 5
4 aa 1 4
5 aa 2 1
Using python 3.5.2, pandas 0.18.1
You can use pandas' categorical Series for this purpose which supplies the functionality of an individual sort order:
df[0] = pd.Categorical(df[0], order)
print(df.sort_values([0, 1]))
0 1 2
0 a 1 0
3 a 2 3
2 c 1 2
5 c 2 5
4 aa 1 4
1 aa 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With