Python Pandas sorting by multiindex and column

Tags:

In Pandas 0.17 I try to sort by a specific column while maintaining the hierarchical index (A and B). B is a running number created when setting up the dataframe through concatenation. My data looks like this:

          C      D
A   B
bar one   shiny  10
    two   dull   5
    three glossy 8
foo one   dull   3
    two   shiny  9
    three matt   12

This is what I need:

          C      D
A   B
bar two   dull   5
    three glossy 8
    one   shiny  10
foo one   dull   3
    three matt   12
    two   shiny  9

Below is the code I am using and the result. Note: Pandas 0.17 alerts that dataframe.sort will be deprecated.

df.sort_values(by="C", ascending=True)
          C      D
A   B
bar two   dull   5
foo one   dull   3
bar three glossy 8
foo three matt   12
bar one   shiny  10
foo two   shiny  9

Adding .groupby produces the same result:

df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)

Similarly, switching to sorting indices first, and then groupby the column is not fruitful:

df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)

I am not certain about reindexing I need to keep the first index A, second index B can be reassigned, but does not have to. It would surprise me if there is not an easy solution; I guess I just don't find it. Any suggestions are appreciated.

Edit: In the meantime I dropped the second index B, reassigned first index A to be a column instead of an index sorted multiple columns, then re-indexed it:

df.index = df.index.droplevel(1)
df.reset_index(level=0, inplace=True)
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index.
df_reindexed = df_sorted.set_index("A")

Still very verbose.

805

asked Oct 17 '15 19:10

raummensch

2 Answers

Feels like there could be a better way, but here's one approach:

In [163]: def sorter(sub_df):
     ...:     sub_df = sub_df.sort_values('C')
     ...:     sub_df.index = sub_df.index.droplevel(0)
     ...:     return sub_df

In [164]: df.groupby(level='A').apply(sorter)
Out[164]: 
                C   D
A   B                
bar two      dull   5
    three  glossy   8
    one     shiny  10
foo one      dull   3
    three    matt  12
    two     shiny   9

107

answered Oct 22 '22 23:10

chrisb

Based on chrisb's code:

Note that in my case, it's a Series not a DataFrame,

s.groupby(level='A', group_keys=False).apply(lambda x: x.sort_values(ascending=False))

answered Oct 23 '22 01:10

G. Cheng

Related questions
                            
                                Parse logs containing python tracebacks using logstash
                            
                                Scrapy get all children / ignore <br>?
                            
                                ipython notebook terminals unavailable [duplicate]
                            
                                Some doubts modelling some features for the libsvm/scikit-learn library in python
                            
                                Should I create a new Pool object every time or reuse a single one?
                            
                                Can Python be made statically typed?
                            
                                Why does 9007199254740993 != 9007199254740993.0?
                            
                                ValueError when adding row to Dataframe
                            
                                Speed up for loop in convolution for numpy 3D array?
                            
                                interrupt thread with start_consuming method of pika
                            
                                No module named pyvirtualdisplay
                            
                                Delete and Edit items in TreeView Tkinter
                            
                                Is there a way to plot a Line2D in points coordinates in Matplotlib in Python?
                            
                                How would I get psycopg2 logging of query execution time?
                            
                                How to know a generated sequence is at most a certain length
                            
                                Python - Control window with pywinauto while the window is minimized or hidden
                            
                                How can I save the same form more than once in Django 1.8?
                            
                                Prevent string being printed python
                            
                                Determining pixel height of a string with newlines in Pillow
                            
                                How to set default value using SqlAlchemy_Utils ChoiceType

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas sorting by multiindex and column

Tags:

python

sorting

indexing

pandas

raummensch

People also ask

2 Answers

chrisb

G. Cheng

Recent Activity

Donate For Us