Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort values in a Multi-index while keeping the index structure

I'd like to sort data of a multi-index dataframe, while keeping higher level indexes the same.

Here is the data sample:

data = {
    'Column 1': [1., 2., 3., 4.,34,2,5,6],
    'Index1 Title': [
        "Apples", "Apples", "Puppies", "Puppies",
        'Oranges','Oranges','Blue berries','Blue berries'],
    'index2 Title': [
        'Inside', 'Outside', 'Inside', 'Outside',
        'Inside', 'Outside', 'Inside', 'Outside'
    ]
}
df = pd.DataFrame(data)
df.set_index(['Index1 Title','index2 Title'],inplace=True)

I get this output:

Index1 Title index2 Title  Column 1
Apples       Inside         1.0
             Outside        2.0
Puppies      Inside         3.0
             Outside        4.0
Oranges      Inside         34.0
             Outside        2.0
Blue berries Inside         5.0
             Outside        6.0

When I try this bit of code:

df.sort_values('Column 1', ascending=False)

I get this:

Index1 Title    index2 Title    Column 1
Oranges         Inside          34.0
Blue berries    Outside         6.0
                Inside          5.0
Puppies         Outside         4.0
                Inside          3.0
Apples          Outside         2.0
Oranges         Outside         2.0
Apples          Inside          1.0

What I am trying to get, is something resembling what you would get using a pivot table on Excel, as so:

Index1 Title    index2 Title    Column 1
Oranges         Inside          34
                Outside         2
Blue berries    Inside          5
                Outside         6
Puppies         Inside          3
                Outside         4
Apples          Inside          1
                Outside         2

Basically making a sum of each Index1 Title values, and sorting them while keeping the Index2 Title structure intact.

I've been struggling with this for a certain time now, couldn't find a fix using standard pandas multi-indexing. Even the reset_index/sort_index routine didn't work

Would a for loop solve it?

like image 273
Sofiane Avatar asked Mar 01 '23 16:03

Sofiane


2 Answers

sort_values accepts index names, so you can do:

df.sort_values(['Index1 Title', 'Column 1'], ascending=[True, False])

output:

                           Column 1
Index1 Title index2 Title          
Apples       Outside            2.0
             Inside             1.0
Blue berries Outside            6.0
             Inside             5.0
Oranges      Inside            34.0
             Outside            2.0
Puppies      Outside            4.0
             Inside             3.0

Update Actually, I think I got what you want. There isn't a direct way to do this. You need to create a new series, sort that and reindex the data:

sorted_sum = (df.groupby(level=0).transform('sum')
                .sort_values(['Column 1', 'index2 Title'], 
                              ascending=[False,True], 
                              kind='mergesort')
             )

df.loc[sorted_sum.index]

Output:

                           Column 1
Index1 Title index2 Title          
Oranges      Inside            34.0
             Outside            2.0
Blue berries Inside             5.0
             Outside            6.0
Puppies      Inside             3.0
             Outside            4.0
Apples       Inside             1.0
             Outside            2.0

Note can't understand why the 2nd level index is reversed, even with mergesort.

like image 99
Quang Hoang Avatar answered Mar 08 '23 14:03

Quang Hoang


You can also use .reindex with the level argument to change 1 or more levels of a MultiIndex without changing the order of the others:

sorted_sums = df["Column 1"].sum(level=0).sort_values(ascending=False)
out = df.reindex(sorted_sums.index, level=0)

print(out)
                           Column 1
Index1 Title index2 Title
Oranges      Inside            34.0
             Outside            2.0
Blue berries Inside             5.0
             Outside            6.0
Puppies      Inside             3.0
             Outside            4.0
Apples       Inside             1.0
             Outside            2.0

Also- if memory serves me correctly there was a bug in the level argument of reindex. I did this on pandas version 1.1.1 so I can't guarantee it'll work on earlier versions.

like image 25
Cameron Riddell Avatar answered Mar 08 '23 13:03

Cameron Riddell