Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resample with categories in pandas, keep non-numerical columns

Tags:

python

pandas

I have hourly data, of variable x for 3 types, and Category column, and ds is set as index.

> df

ds                   Category   X
2010-01-01 01:00:00     A       32
2010-01-01 01:00:00     B       13
2010-01-01 01:00:00     C       09
2010-01-01 02:00:00     A       12
2010-01-01 02:00:00     B       62
2010-01-01 02:00:00     C       12

I want to resample it to Week. But if I use df2 = df.resample('W').mean(), it simply drops 'Category' Column.

like image 781
Martan Avatar asked May 06 '19 09:05

Martan


2 Answers

If need resample per Category column per weeks add groupby, so is using DataFrameGroupBy.resample:

Notice:
For correct working is necessary DatetimeIndex.

df2 = df.groupby('Category').resample('W').mean()
print (df2)
                        X
Category ds              
A        2010-01-03  22.0
B        2010-01-03  37.5
C        2010-01-03  10.5
like image 116
jezrael Avatar answered Oct 21 '22 02:10

jezrael


To complete the answer by jezrael, I found it useful to put the content back as a DataFrame instead of a DataFrameGroup, as explained here. So, the answer will be:

df2 = df.groupby('Category').resample('W').mean()

# the inverse of groupby, reset_index
df2 = df2.reset_index()

# set again the timestamp as index
df2 = df2.set_index("ds")

like image 30
aitorhh Avatar answered Oct 21 '22 04:10

aitorhh