pandas custom sorting multilevel index

I have the following example dataset, and I'd like to sort the index columns by a custom order that is not contained within the dataframe. So far looking on SO I haven't been able to solve this. Example:

import pandas as pd

data = {'s':[1,1,1,1], 
        'am':['cap', 'cap', 'sea', 'sea'], 
        'cat':['i', 'o', 'i', 'o'],
        'col1':[.55, .44, .33, .22],
        'col2':[.77, .66, .55, .44]}

df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)

Out[1]: 
           col1  col2
s am  cat            
1 cap i    0.55  0.77
      o    0.44  0.66
  sea i    0.33  0.55
      o    0.22  0.44

What I would like is the following:

Out[2]: 
           col1  col2
s am  cat            
1 sea i    0.33  0.55
      o    0.22  0.44
  cap i    0.55  0.77
      o    0.44  0.66

and I might also want to sort by 'cat' with the order ['o', 'i'], as well.

How do I sort an index in Pandas?

To sort a Pandas DataFrame by index, you can use DataFrame. sort_index() method. To specify whether the method has to sort the DataFrame in ascending or descending order of index, you can set the named boolean argument ascending to True or False respectively. When the index is sorted, respective rows are rearranged.

What does sort_index do in Pandas?

sort_index() function sorts objects by labels along the given axis. Basically the sorting algorithm is applied on the axis labels rather than the actual data in the dataframe and based on that the data is rearranged.

What is the difference between LOC and ILOC in Pandas?

When it comes to selecting rows and columns of a pandas DataFrame, loc and iloc are two commonly used functions. Here is the subtle difference between the two functions: loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.

Use sort_values and sort_index

df.sort_values(df.columns.tolist()).sort_index(level=1, ascending=False, 
                                                        sort_remaining=False)

              col1  col2
s   am   cat        
1   sea  i    0.33  0.55
         o    0.22  0.44
    cap  i    0.55  0.77
         o    0.44  0.66

Convert the index to categorical to get the custom order.

data = {'s':[1,1,1,1], 
            'am':['cap', 'cap', 'sea', 'sea'], 
            'cat':['i', 'j', 'k', 'l'],
            'col1':[.55, .44, .33, .22],
            'col2':[.77, .66, .55, .44]}

df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)

idx = pd.Categorical(df.index.get_level_values(2).values,
          categories=['j','i','k','l'],
          ordered=True)

df.index.set_levels(idx, level='cat', inplace=True)

df.reset_index().sort_values('cat').set_index(['s','am','cat'])

             col1   col2
s   am  cat     
1   cap  j   0.44   0.66
         i   0.55   0.77
    sea  k   0.33   0.55
         l   0.22   0.44

pandas custom sorting multilevel index

Tags:

python-3.x

pandas

fffrost

People also ask

1 Answers

Abhi

Recent Activity

Donate For Us

pandas custom sorting multilevel index

Tags:

python-3.x

pandas

fffrost

People also ask

1 Answers

Abhi

Related questions

Recent Activity

Donate For Us