I have the following example dataset, and I'd like to sort the index columns by a custom order that is not contained within the dataframe. So far looking on SO I haven't been able to solve this. Example:
import pandas as pd
data = {'s':[1,1,1,1],
'am':['cap', 'cap', 'sea', 'sea'],
'cat':['i', 'o', 'i', 'o'],
'col1':[.55, .44, .33, .22],
'col2':[.77, .66, .55, .44]}
df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)
Out[1]:
col1 col2
s am cat
1 cap i 0.55 0.77
o 0.44 0.66
sea i 0.33 0.55
o 0.22 0.44
What I would like is the following:
Out[2]:
col1 col2
s am cat
1 sea i 0.33 0.55
o 0.22 0.44
cap i 0.55 0.77
o 0.44 0.66
and I might also want to sort by 'cat' with the order ['o', 'i'], as well.
To sort a Pandas DataFrame by index, you can use DataFrame. sort_index() method. To specify whether the method has to sort the DataFrame in ascending or descending order of index, you can set the named boolean argument ascending to True or False respectively. When the index is sorted, respective rows are rearranged.
sort_index() function sorts objects by labels along the given axis. Basically the sorting algorithm is applied on the axis labels rather than the actual data in the dataframe and based on that the data is rearranged.
When it comes to selecting rows and columns of a pandas DataFrame, loc and iloc are two commonly used functions. Here is the subtle difference between the two functions: loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.
Use sort_values
and sort_index
df.sort_values(df.columns.tolist()).sort_index(level=1, ascending=False,
sort_remaining=False)
col1 col2
s am cat
1 sea i 0.33 0.55
o 0.22 0.44
cap i 0.55 0.77
o 0.44 0.66
Convert the index to categorical
to get the custom order.
data = {'s':[1,1,1,1],
'am':['cap', 'cap', 'sea', 'sea'],
'cat':['i', 'j', 'k', 'l'],
'col1':[.55, .44, .33, .22],
'col2':[.77, .66, .55, .44]}
df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)
idx = pd.Categorical(df.index.get_level_values(2).values,
categories=['j','i','k','l'],
ordered=True)
df.index.set_levels(idx, level='cat', inplace=True)
df.reset_index().sort_values('cat').set_index(['s','am','cat'])
col1 col2
s am cat
1 cap j 0.44 0.66
i 0.55 0.77
sea k 0.33 0.55
l 0.22 0.44
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With