Multi-Index Sorting in Pandas

Tags:

I have a multi-index DataFrame created via a groupby operation. I'm trying to do a compound sort using several levels of the index, but I can't seem to find a sort function that does what I need.

Initial dataset looks something like this (daily sales counts of various products):

         Date Manufacturer Product Name Product Launch Date  Sales
0  2013-01-01        Apple         iPod          2001-10-23     12
1  2013-01-01        Apple         iPad          2010-04-03     13
2  2013-01-01      Samsung       Galaxy          2009-04-27     14
3  2013-01-01      Samsung   Galaxy Tab          2010-09-02     15
4  2013-01-02        Apple         iPod          2001-10-23     22
5  2013-01-02        Apple         iPad          2010-04-03     17
6  2013-01-02      Samsung       Galaxy          2009-04-27     10
7  2013-01-02      Samsung   Galaxy Tab          2010-09-02      7

I use groupby to get a sum over the date range:

> grouped = df.groupby(['Manufacturer', 'Product Name', 'Product Launch Date']).sum()
                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPad         2010-04-03              30
             iPod         2001-10-23              34
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

So far so good!

Now the last thing I want to do is sort each manufacturer's products by launch date, but keep them grouped hierarchically under Manufacturer - here's all I am trying to do:

                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPod         2001-10-23              34
             iPad         2010-04-03              30
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

When I try sortlevel() I lose the nice per-company hierarchy I had before:

> grouped.sortlevel('Product Launch Date')
                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPod         2001-10-23              34
Samsung      Galaxy       2009-04-27              24
Apple        iPad         2010-04-03              30
Samsung      Galaxy Tab   2010-09-02              22

sort() and sort_index() just fail:

grouped.sort(['Manufacturer','Product Launch Date'])
KeyError: u'no item named Manufacturer'

grouped.sort_index(by=['Manufacturer','Product Launch Date'])
KeyError: u'no item named Manufacturer'

Seems like a simple operation, but I can't quite figure it out.

I'm not tied to using a MultiIndex for this, but since that's what groupby() returns, that's what I've been working with.

BTW the code to produce the initial DataFrame is:

data = {
  'Date': ['2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01', '2013-01-02', '2013-01-02', '2013-01-02', '2013-01-02'],
  'Manufacturer' : ['Apple', 'Apple', 'Samsung', 'Samsung', 'Apple', 'Apple', 'Samsung', 'Samsung',],
  'Product Name' : ['iPod', 'iPad', 'Galaxy', 'Galaxy Tab', 'iPod', 'iPad', 'Galaxy', 'Galaxy Tab'], 
  'Product Launch Date' : ['2001-10-23', '2010-04-03', '2009-04-27', '2010-09-02','2001-10-23', '2010-04-03', '2009-04-27', '2010-09-02'],
  'Sales' : [12, 13, 14, 15, 22, 17, 10, 7]
}
df = DataFrame(data, columns=['Date', 'Manufacturer', 'Product Name', 'Product Launch Date', 'Sales'])

579

asked Jun 21 '13 19:06

Keeth

1 Answers

A hack would be to change the order of the levels:

In [11]: g
Out[11]:
                                               Sales
Manufacturer Product Name Product Launch Date
Apple        iPad         2010-04-03              30
             iPod         2001-10-23              34
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

In [12]: g.index = g.index.swaplevel(1, 2)

Sortlevel, which (as you've found) sorts the MultiIndex levels in order:

In [13]: g = g.sortlevel()

And swap back:

In [14]: g.index = g.index.swaplevel(1, 2)

In [15]: g
Out[15]:
                                               Sales
Manufacturer Product Name Product Launch Date
Apple        iPod         2001-10-23              34
             iPad         2010-04-03              30
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

I'm of the opinion that sortlevel should not sort the remaining labels in order, so will create a github issue. :) Although it's worth mentioning the docnote about "the need for sortedness".

Note: you could avoid the first swaplevel by reordering the order of the initial groupby:

g = df.groupby(['Manufacturer', 'Product Launch Date', 'Product Name']).sum()

answered Oct 02 '22 07:10

Andy Hayden

Related questions
                            
                                Shiny R application that allows users to modify data
                            
                                Android Studio need to rebuild project before each run
                            
                                Is it bad to have the same name for parameter as for member variable? [duplicate]
                            
                                How to ignore libraries in browserify programmatic api
                            
                                Inconsistency in WPF command routing behavior depending on the UI focus state
                            
                                Vagrant provision live output
                            
                                Difference between Text and String in Hadoop
                            
                                How portable is using the low bit of a pointer as a flag?
                            
                                Dynamically created method and decorator, got error 'functools.partial' object has no attribute '__module__'
                            
                                How to get current path of the file that run the script in nodejs
                            
                                Javascript load vs ready vs domready vs DOMContentLoaded events
                            
                                The connection to adb is down, and a severe error has occured.You must restart adb and Eclipse.Please ensure that adb is correctly located

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With