Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index sort order of a multi-index dataframe does not respect categorical index order

Tags:

python

pandas

A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'.

import pandas as pd
df = pd.DataFrame({'A':[1,1,2,2],
  'B':['One','Two','Three', 'Four'], 
  'X':[1,2,3,4]},
  index=range(4)).set_index(['A','B']).sort_index()
df

         X
A B       
1 One    1
  Two    2
2 Four   4
  Three  3

Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct ordering.

df.index.set_levels(pd.CategoricalIndex(df.index.levels[1], 
       categories=['One','Two','Three', 'Four'], ordered=True), 
    level=1, inplace=True)

With this done inspecting the index shows that level 1 is indeed a categorical index. But sorting the index does not put the rows in the desired order.

df.sort_index()

         X
A B       
1 One    1
  Two    2
2 Four   4
  Three  3

Note: If the the dataframe has a simple index of 1 level only this works as expected.

like image 452
Ymareth Avatar asked Nov 16 '25 20:11

Ymareth


1 Answers

I managed to get this by setting the index after the dataframe has been created - not sure if this is the best answer but it's an answer:

df = pd.DataFrame({'A':[1,1,2,2],
   'B':['One','Two','Three', 'Four'], 
   'X':[1,2,3,4]})
df = df.set_index(['A', pd.CategoricalIndex(df['B'], categories=['One','Two','Three', 'Four'], ordered=True)])
del df['B']
like image 93
gyx-hh Avatar answered Nov 18 '25 10:11

gyx-hh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!