I have a dataFrame like this, I would like to group every 60 minutes and start grouping at 06:30.
data index 2017-02-14 06:29:57 11198648 2017-02-14 06:30:01 11198650 2017-02-14 06:37:22 11198706 2017-02-14 23:11:13 11207728 2017-02-14 23:21:43 11207774 2017-02-14 23:22:36 11207776
I am using:
df.groupby(pd.TimeGrouper(freq='60Min'))
I get this grouping:
data index 2017-02-14 06:00:00 x1 2017-02-14 07:00:00 x2 2017-02-14 08:00:00 x3 2017-02-14 09:00:00 x4 2017-02-14 10:00:00 x5
but I am looking for this result:
data index 2017-02-14 06:30:00 x1 2017-02-14 07:30:00 x2 2017-02-14 08:30:00 x3 2017-02-14 09:30:00 x4 2017-02-14 10:30:00 x5
How can I tell the function to start grouping at 6:30 at one-hour intervals?
If it can not be done by the .groupby(pd.TimeGrouper(freq='60Min')), how is the best way to do it?
A salute and thanks very much in advance
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python libraries like scikits.
Use base=30
in conjunction with label='right'
parameters in pd.Grouper
.
Specifying label='right'
makes the time-period to start grouping from 6:30 (higher side) and not 5:30. Also, base
is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.
Suppose, you want to aggregate the first element of every sub-group, then:
df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first() # same thing using resample - df.resample('60Min', base=30, label='right').first()
yields:
data index 2017-02-14 06:30:00 11198648.0 2017-02-14 07:30:00 11198650.0 2017-02-14 08:30:00 NaN 2017-02-14 09:30:00 NaN 2017-02-14 10:30:00 NaN 2017-02-14 11:30:00 NaN 2017-02-14 12:30:00 NaN 2017-02-14 13:30:00 NaN 2017-02-14 14:30:00 NaN 2017-02-14 15:30:00 NaN 2017-02-14 16:30:00 NaN 2017-02-14 17:30:00 NaN 2017-02-14 18:30:00 NaN 2017-02-14 19:30:00 NaN 2017-02-14 20:30:00 NaN 2017-02-14 21:30:00 NaN 2017-02-14 22:30:00 NaN 2017-02-14 23:30:00 11207728.0
Using DataFrame.resample
which is a dedicated method for resampling time series, this way we dont need DataFrame.GroupBy
and pd.Grouper
:
df.resample('60min', base=30, label='right').first()
Output
data index 2017-02-14 06:30:00 11198648.0 2017-02-14 07:30:00 11198650.0 2017-02-14 08:30:00 NaN 2017-02-14 09:30:00 NaN 2017-02-14 10:30:00 NaN 2017-02-14 11:30:00 NaN 2017-02-14 12:30:00 NaN 2017-02-14 13:30:00 NaN 2017-02-14 14:30:00 NaN 2017-02-14 15:30:00 NaN 2017-02-14 16:30:00 NaN 2017-02-14 17:30:00 NaN 2017-02-14 18:30:00 NaN 2017-02-14 19:30:00 NaN 2017-02-14 20:30:00 NaN 2017-02-14 21:30:00 NaN 2017-02-14 22:30:00 NaN 2017-02-14 23:30:00 11207728.0
Notice: when you have multiple columns in your dataframe, you have to specify the column you want to aggregate on:
df.resample('60min', base=30, label='right')['data'].first()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With