How to do a 'groupby' by multilevel index in Pandas

Tags:

pandas

I have a dataframe 'RPT' indexed by (STK_ID,RPT_Date), contains the accumulated sales of stocks for each qurter:

                       sales
STK_ID  RPT_Date
000876  20060331      798627000
        20060630     1656110000
        20060930     2719700000
        20061231     3573660000
        20070331      878415000
        20070630     2024660000
        20070930     3352630000
        20071231     4791770000
600141  20060331      270912000
        20060630      658981000
        20060930     1010270000
        20061231     1591500000
        20070331      319602000
        20070630      790670000
        20070930     1250530000
        20071231     1711240000

I want to calculate the single qurterly sales using 'groupby' by STK_ID & RPT_Yr ,such as : RPT.groupby('STK_ID','RPT_Yr')['sales'].transform(lambda x: x-x.shift(1)) , how to do that ?

suppose I can get the year by lambda x : datetime.strptime(x, '%Y%m%d').year

252

asked Aug 30 '12 06:08

1 Answers

Assuming here that RPT_Data is a string, any reason why not to use Datetime?

It is possible to groupby using functions, but only on a non MultiIndex-index. Working around this by resetting the index, and set 'RPT_Date' as index to extract the year (note: pandas toggles between object and int as dtype for 'RPT_Date').

In [135]: year = lambda x : datetime.strptime(str(x), '%Y%m%d').year

In [136]: grouped = RPT.reset_index().set_index('RPT_Date').groupby(['STK_ID', year])

In [137]: for key, df in grouped:
   .....:     print key
   .....:     print df
   .....:
(876, 2006)
          STK_ID       sales
RPT_Date
20060331     876   798627000
20060630     876  1656110000
20060930     876  2719700000
20061231     876  3573660000
(876, 2007)
          STK_ID       sales
RPT_Date
20070331     876   878415000
20070630     876  2024660000
20070930     876  3352630000
20071231     876  4791770000
(600141, 2006)
          STK_ID       sales
RPT_Date
20060331  600141   270912000
20060630  600141   658981000
20060930  600141  1010270000
20061231  600141  1591500000
(600141, 2007)
          STK_ID       sales
RPT_Date
20070331  600141   319602000
20070630  600141   790670000
20070930  600141  1250530000
20071231  600141  1711240000

Other option is to use a tmp column

In [153]: RPT_tmp = RPT.reset_index()

In [154]: RPT_tmp['year'] = RPT_tmp['RPT_Date'].apply(year)

In [155]: grouped = RPT_tmp.groupby(['STK_ID', 'year'])

EDIT Reorganising your frame make it much easier.

In [48]: RPT
Out[48]: 
                                  sales
STK_ID RPT_Year RPT_Quarter            
876    2006     0             798627000
                1            1656110000
                2            2719700000
                3            3573660000
       2007     0             878415000
                1            2024660000
                2            3352630000
                3            4791770000
600141 2006     0             270912000
                1             658981000
                2            1010270000
                3            1591500000
       2007     0             319602000
                1             790670000
                2            1250530000
                3            1711240000

In [49]: RPT.groupby(level=['STK_ID', 'RPT_Year'])['sales'].apply(sale_per_q)
Out[49]: 
STK_ID  RPT_Year  RPT_Quarter
876     2006      0               798627000
                  1               857483000
                  2              1063590000
                  3               853960000
        2007      0               878415000
                  1              1146245000
                  2              1327970000
                  3              1439140000
600141  2006      0               270912000
                  1               388069000
                  2               351289000
                  3               581230000
        2007      0               319602000
                  1               471068000
                  2               459860000
                  3               460710000

158

answered Sep 20 '22 14:09

Wouter Overmeire

Related questions
                            
                                GDB pretty printers for Qt5
                            
                                incomplete gamma function in python?
                            
                                Python: wrapping recursive functions
                            
                                copy.deepcopy raises TypeError on objects with self-defined __new__() method
                            
                                SQLAlchemy declarative property from join (single attribute, not whole object)
                            
                                Make reverse diagonals white in heatmap
                            
                                Python multiprocess debugging
                            
                                pandas row specific apply
                            
                                What is the simplest way to programatically start a crawler in Scrapy >= 0.14
                            
                                Reasons to rename property to _property
                            
                                how to create pymongo connection per request in Flask
                            
                                Handling Failure in Twisted
                            
                                Are GAE instances limited to 10 concurrent request?
                            
                                How can I interpolate georeferenced data in python?
                            
                                Django: Error when calling the metaclass bases
                            
                                Having Trouble Getting SimpleHTTPRequestHandler to respond to AJAX
                            
                                About a PyQt example program
                            
                                Can't configure pyQt
                            
                                Binary selection process
                            
                                How to get the module from which the currently executing function was called?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to do a 'groupby' by multilevel index in Pandas

Tags:

python

pandas

bigbug

People also ask

1 Answers

Wouter Overmeire

Recent Activity

Donate For Us