I'd like to sort data of a multi-index dataframe, while keeping higher level indexes the same.
Here is the data sample:
data = {
'Column 1': [1., 2., 3., 4.,34,2,5,6],
'Index1 Title': [
"Apples", "Apples", "Puppies", "Puppies",
'Oranges','Oranges','Blue berries','Blue berries'],
'index2 Title': [
'Inside', 'Outside', 'Inside', 'Outside',
'Inside', 'Outside', 'Inside', 'Outside'
]
}
df = pd.DataFrame(data)
df.set_index(['Index1 Title','index2 Title'],inplace=True)
I get this output:
Index1 Title index2 Title Column 1
Apples Inside 1.0
Outside 2.0
Puppies Inside 3.0
Outside 4.0
Oranges Inside 34.0
Outside 2.0
Blue berries Inside 5.0
Outside 6.0
When I try this bit of code:
df.sort_values('Column 1', ascending=False)
I get this:
Index1 Title index2 Title Column 1
Oranges Inside 34.0
Blue berries Outside 6.0
Inside 5.0
Puppies Outside 4.0
Inside 3.0
Apples Outside 2.0
Oranges Outside 2.0
Apples Inside 1.0
What I am trying to get, is something resembling what you would get using a pivot table on Excel, as so:
Index1 Title index2 Title Column 1
Oranges Inside 34
Outside 2
Blue berries Inside 5
Outside 6
Puppies Inside 3
Outside 4
Apples Inside 1
Outside 2
Basically making a sum of each Index1 Title values, and sorting them while keeping the Index2 Title structure intact.
I've been struggling with this for a certain time now, couldn't find a fix using standard pandas
multi-indexing. Even the reset_inde
x/sort_index
routine didn't work
Would a for loop solve it?
sort_values
accepts index names, so you can do:
df.sort_values(['Index1 Title', 'Column 1'], ascending=[True, False])
output:
Column 1
Index1 Title index2 Title
Apples Outside 2.0
Inside 1.0
Blue berries Outside 6.0
Inside 5.0
Oranges Inside 34.0
Outside 2.0
Puppies Outside 4.0
Inside 3.0
Update Actually, I think I got what you want. There isn't a direct way to do this. You need to create a new series, sort that and reindex the data:
sorted_sum = (df.groupby(level=0).transform('sum')
.sort_values(['Column 1', 'index2 Title'],
ascending=[False,True],
kind='mergesort')
)
df.loc[sorted_sum.index]
Output:
Column 1
Index1 Title index2 Title
Oranges Inside 34.0
Outside 2.0
Blue berries Inside 5.0
Outside 6.0
Puppies Inside 3.0
Outside 4.0
Apples Inside 1.0
Outside 2.0
Note can't understand why the 2nd level index is reversed, even with mergesort
.
You can also use .reindex
with the level
argument to change 1 or more levels of a MultiIndex
without changing the order of the others:
sorted_sums = df["Column 1"].sum(level=0).sort_values(ascending=False)
out = df.reindex(sorted_sums.index, level=0)
print(out)
Column 1
Index1 Title index2 Title
Oranges Inside 34.0
Outside 2.0
Blue berries Inside 5.0
Outside 6.0
Puppies Inside 3.0
Outside 4.0
Apples Inside 1.0
Outside 2.0
Also- if memory serves me correctly there was a bug in the level
argument of reindex
. I did this on pandas version 1.1.1
so I can't guarantee it'll work on earlier versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With