Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling a multi-index DataFrame

Tags:

python

pandas

I want to resample a DataFrame with a multi-index containing both a datetime column and some other key. The Dataframe looks like:

import pandas as pd
from StringIO import StringIO

csv = StringIO("""ID,NAME,DATE,VAR1
1,a,03-JAN-2013,69
1,a,04-JAN-2013,77
1,a,05-JAN-2013,75
2,b,03-JAN-2013,69
2,b,04-JAN-2013,75
2,b,05-JAN-2013,72""")

df = pd.read_csv(csv, index_col=['DATE', 'ID'], parse_dates=['DATE'])
df.columns.name = 'Params'

Because resampling is only allowed on datatime indexes, i thought unstacking the other index column would help. And indeed it does, but i cant stack it again afterwards.

print df.unstack('ID').resample('W-THU')

Params      VAR1      
ID               1     2
DATE                    
2013-01-03      69  69.0
2013-01-10      76  73.5

But then stacking 'ID' again results in an index-error:

print df.unstack('ID').resample('W-THU').stack('ID')

IndexError: index 0 is out of bounds for axis 0 with size 0

Strangely enough, i can stack the other column level with both:

print df.unstack('ID').resample('W-THU').stack(0)

and

print df.unstack('ID').resample('W-THU').stack('Params')

The index-error also occurs if i reorder (swap) both column levels. Does anyone know how to overcome this issue?

like image 856
Rutger Kassies Avatar asked Mar 25 '13 14:03

Rutger Kassies


1 Answers

The example unstacks a non-numerical column 'NAME' which is silently dropped but causes problems during re-stacking. The code below worked for me

print df[['VAR1']].unstack('ID').resample('W-THU').stack('ID')
Params         VAR1
DATE       ID
2013-01-03 A   69.0
           B   69.0
2013-01-10 A   76.0
           B   73.5
like image 89
user1827356 Avatar answered Oct 30 '22 08:10

user1827356