In Pandas 0.17 I try to sort by a specific column while maintaining the hierarchical index (A and B). B is a running number created when setting up the dataframe through concatenation. My data looks like this:
C D
A B
bar one shiny 10
two dull 5
three glossy 8
foo one dull 3
two shiny 9
three matt 12
This is what I need:
C D
A B
bar two dull 5
three glossy 8
one shiny 10
foo one dull 3
three matt 12
two shiny 9
Below is the code I am using and the result. Note: Pandas 0.17 alerts that dataframe.sort will be deprecated.
df.sort_values(by="C", ascending=True)
C D
A B
bar two dull 5
foo one dull 3
bar three glossy 8
foo three matt 12
bar one shiny 10
foo two shiny 9
Adding .groupby produces the same result:
df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)
Similarly, switching to sorting indices first, and then groupby the column is not fruitful:
df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)
I am not certain about reindexing I need to keep the first index A, second index B can be reassigned, but does not have to. It would surprise me if there is not an easy solution; I guess I just don't find it. Any suggestions are appreciated.
Edit: In the meantime I dropped the second index B, reassigned first index A to be a column instead of an index sorted multiple columns, then re-indexed it:
df.index = df.index.droplevel(1)
df.reset_index(level=0, inplace=True)
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index.
df_reindexed = df_sorted.set_index("A")
Still very verbose.
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
However, to sort MultiIndex at a specific level, use the multiIndex. sortlevel() method in Pandas. Set the level as an argument. To sort in descending order, use the ascending parameter and set to False.
To rearrange levels in MultiIndex, use the MultiIndex. reorder_levels() method in Pandas. Set the order of levels using the order parameter.
Feels like there could be a better way, but here's one approach:
In [163]: def sorter(sub_df):
...: sub_df = sub_df.sort_values('C')
...: sub_df.index = sub_df.index.droplevel(0)
...: return sub_df
In [164]: df.groupby(level='A').apply(sorter)
Out[164]:
C D
A B
bar two dull 5
three glossy 8
one shiny 10
foo one dull 3
three matt 12
two shiny 9
Based on chrisb's code:
Note that in my case, it's a Series not a DataFrame,
s.groupby(level='A', group_keys=False).apply(lambda x: x.sort_values(ascending=False))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With