Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a DataFrame by two columns, using a custom order?

I have a pandas DataFrame that I need to sort in a particular order in one column, and just ascending in another. Both columns have repeated values. It looks more or less like this:

import pandas as pd

df = pd.DataFrame()
df[0] = pd.Series( [ 'a', 'aa', 'c' ] * 2 )
df[1] = pd.Series( [ 1, 2 ] * 3 )
df[2] = pd.Series( range(6) )
print( df )

    0  1  2
0   a  1  0
1  aa  2  1
2   c  1  2
3   a  2  3
4  aa  1  4
5   c  2  5

Now, say that I need to order by columns 0 and 1, but not alphabetically: Column 0 should first follow an order:

order = [ 'a', 'c', 'aa' ]

How do I do that?

I would like to have it sorted like this:

print( sorted_df )

    0  1  2
0   a  1  0
1   a  2  3
2   c  1  2
3   c  2  5
4  aa  1  4
5  aa  2  1

Using python 3.5.2, pandas 0.18.1

like image 979
Luis Avatar asked Dec 19 '22 09:12

Luis


1 Answers

You can use pandas' categorical Series for this purpose which supplies the functionality of an individual sort order:

 df[0] = pd.Categorical(df[0], order)
 print(df.sort_values([0, 1]))

    0   1   2
 0  a   1   0
 3  a   2   3
 2  c   1   2
 5  c   2   5
 4  aa  1   4
 1  aa  2   1
like image 199
pansen Avatar answered Feb 19 '23 14:02

pansen