my pandas version is 0.18 and I have a minute data that looks as follows:
Time
2009-01-30 09:30:00 85.11 100.11
2009-01-30 09:39:00 84.93 100.05
2009-01-30 09:40:00 84.90 100.00
2009-01-30 09:45:00 84.91 99.94
2009-01-30 09:48:00 84.81 99.90
2009-01-30 09:55:00 84.78 100.00
2009-01-30 09:56:00 84.57 100.10
2009-01-30 09:59:00 84.25 100.41
2009-01-30 10:00:00 84.32 100.60
2009-01-30 10:06:00 84.23 101.49
2009-01-30 10:09:00 84.15 101.47
I want to use data only from 9:30 and 16:00 and resample the data in 78 min intervals (i.e divide the time between 9:30 and 16:00 into 5 equal parts). My code looks as follows:
Data= Data.between_time('9:30','16:00')
tframe = '78T'
hourlym = Data.resample(tframe, base=30).mean()
The output:
Time
2009-01-30 08:18:00 85.110000 100.110000
2009-01-30 09:36:00 83.950645 101.984516
2009-01-30 10:54:00 83.372294 103.093824
2009-01-30 12:12:00 83.698624 102.566897
2009-01-30 13:30:00 83.224397 103.076667
2009-01-30 14:48:00 82.641167 104.114667
2009-01-30 16:06:00 NaN NaN
2009-01-30 17:24:00 NaN NaN
2009-01-30 18:42:00 NaN NaN
As you can see pandas ignores my base parameter and my output table starts from 8:18, I believe this is because pandas seeks how to properly split my whole data into 78 minute and since you cannot divide 24h into 78 mins equally this weird behavior occurs. Is it possible to force pandas to start resampling specifically from 9:30 on the 1st day? Or work only with specific hours while reampling?
The base
argument is applied to midnight, so in your case the sampling starts from 00:30 and adds 78 min increments from there. I see two options.
Option 1:
Figure out what the base
applied to midnight should be in order to reach 9:30 (in this case 24
):
Data.resample(tframe, base=24)
Option 2:
Generate the datetimeindex yourself, and resample with reindex
:
index = pd.date_range('2009-01-30 09:30:00', '2009-01-30 16:00:00', freq='78min')
Data.reindex(index=index)
EDIT: for multiple days you will need to generate the timestamps yourself.
index_date = pd.date_range('2016-04-01', '2016-04-04')
index_date = pd.Series(index_date)
index_time = pd.date_range('09:30:00', '16:00:00', freq='78min')
index_time = pd.Series(index_time.time)
index = index_date.apply(
lambda d: index_time.apply(
lambda t: datetime.combine(d, t)
)
).unstack().sort_values().reset_index(drop=True)
Here is what the code does:
apply
property.unstack
and sort the timestamps (and finally reset the index to get rid of a useless index generated along the way).The resulting index
can be used to reindex as in option 2 originally:
Data.reindex(index=index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With