Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping by 30 minutes space in Pandas

Tags:

python

pandas

I have a sequence of one month of data and I wanted the average value at 00:00, 00:30, 01:00, ...23:30 for the whole month. If this were in an hourly basis I could simply do

df.groupby(df.index.hour).mean()

but I have no idea how to do this for a 30 min period. I have tried random things like

df.groupby(df.index.hour*df.index.minute).mean()

but none of them has worked. Can this be done in Pandas?

Thank you.

EDIT Sample data:

2015-06-01 00:00:00     4.474450  137.007017
2015-06-01 00:30:00     5.661688  138.342549
2015-06-01 01:00:00     6.142984  139.469381
2015-06-01 01:30:00     6.245277  140.780341
2015-06-01 02:00:00     6.368909  141.464176
2015-06-01 02:30:00     6.535648  143.121590
...                          ...         ...
2015-06-04 21:30:00     6.380301  123.523559
2015-06-04 22:00:00     6.118872  124.649216
2015-06-04 22:30:00     6.554864  127.671638
2015-06-04 23:00:00     7.628708  129.960442
2015-06-04 23:30:00     8.082754  132.294248
2015-06-04 00:00:00     7.768733  132.960135

Note that the the data spans more than a day, but I am looking for the result in as a 24*2 length array. Since the 00:30 data will be the average of the data for all the days in that time and so on.

like image 382
TomCho Avatar asked Feb 10 '23 23:02

TomCho


1 Answers

IIUC I think you want the following:

In [13]:
# load the data
t="""2015-06-01 00:00:00,4.474450,137.007017
2015-06-01 00:30:00,5.661688,138.342549
2015-06-01 01:00:00,6.142984,139.469381
2015-06-01 01:30:00,6.245277,140.780341
2015-06-01 02:00:00,6.368909,141.464176
2015-06-01 02:30:00,6.535648,143.121590
2015-06-04 21:30:00,6.380301,123.523559
2015-06-04 22:00:00,6.118872,124.649216
2015-06-04 22:30:00,6.554864,127.671638
2015-06-04 23:00:00,7.628708,129.960442
2015-06-04 23:30:00,8.082754,132.294248
2015-06-04 00:00:00,7.768733,132.960135"""
df = pd.read_csv(io.StringIO(t), index_col=[0], parse_dates=[0], header=None)
df.columns = ['x','y']

In [14]:
# group on the hour and minute attribute of the index
df.groupby([df.index.hour, df.index.minute]).mean()
Out[14]:
              x           y
0  0   6.121592  134.983576
   30  5.661688  138.342549
1  0   6.142984  139.469381
   30  6.245277  140.780341
2  0   6.368909  141.464176
   30  6.535648  143.121590
21 30  6.380301  123.523559
22 0   6.118872  124.649216
   30  6.554864  127.671638
23 0   7.628708  129.960442
   30  8.082754  132.294248

So the above will group on the index hour and minute attribute of your index, so this gives you the mean for all days that have a value at 00:30, 01:00 etc..

like image 63
EdChum Avatar answered Feb 12 '23 12:02

EdChum