Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas, Resampling only specific hours

Tags:

python

pandas

my pandas version is 0.18 and I have a minute data that looks as follows:

Time                              
2009-01-30 09:30:00  85.11  100.11
2009-01-30 09:39:00  84.93  100.05
2009-01-30 09:40:00  84.90  100.00
2009-01-30 09:45:00  84.91   99.94
2009-01-30 09:48:00  84.81   99.90
2009-01-30 09:55:00  84.78  100.00
2009-01-30 09:56:00  84.57  100.10
2009-01-30 09:59:00  84.25  100.41
2009-01-30 10:00:00  84.32  100.60
2009-01-30 10:06:00  84.23  101.49
2009-01-30 10:09:00  84.15  101.47

I want to use data only from 9:30 and 16:00 and resample the data in 78 min intervals (i.e divide the time between 9:30 and 16:00 into 5 equal parts). My code looks as follows:

Data= Data.between_time('9:30','16:00')
tframe = '78T'
hourlym = Data.resample(tframe, base=30).mean()

The output:

Time                                      
2009-01-30 08:18:00  85.110000  100.110000
2009-01-30 09:36:00  83.950645  101.984516
2009-01-30 10:54:00  83.372294  103.093824
2009-01-30 12:12:00  83.698624  102.566897
2009-01-30 13:30:00  83.224397  103.076667
2009-01-30 14:48:00  82.641167  104.114667
2009-01-30 16:06:00        NaN         NaN
2009-01-30 17:24:00        NaN         NaN
2009-01-30 18:42:00        NaN         NaN

As you can see pandas ignores my base parameter and my output table starts from 8:18, I believe this is because pandas seeks how to properly split my whole data into 78 minute and since you cannot divide 24h into 78 mins equally this weird behavior occurs. Is it possible to force pandas to start resampling specifically from 9:30 on the 1st day? Or work only with specific hours while reampling?

like image 280
kroonike Avatar asked May 02 '16 07:05

kroonike


1 Answers

The base argument is applied to midnight, so in your case the sampling starts from 00:30 and adds 78 min increments from there. I see two options.

Option 1:

Figure out what the base applied to midnight should be in order to reach 9:30 (in this case 24):

Data.resample(tframe, base=24)

Option 2:

Generate the datetimeindex yourself, and resample with reindex:

index = pd.date_range('2009-01-30 09:30:00', '2009-01-30 16:00:00', freq='78min')
Data.reindex(index=index)

EDIT: for multiple days you will need to generate the timestamps yourself.

index_date = pd.date_range('2016-04-01', '2016-04-04')
index_date = pd.Series(index_date)
index_time = pd.date_range('09:30:00', '16:00:00', freq='78min')
index_time = pd.Series(index_time.time)

index = index_date.apply(
    lambda d: index_time.apply(
        lambda t: datetime.combine(d, t)
        )
    ).unstack().sort_values().reset_index(drop=True)

Here is what the code does:

  • Generate the dates and times you're interested in, and make them into series to have the apply property.
  • Using nested 'applies', loop over dates and times and combine them into a datetime object.
  • The output is a square dataframe (one column per date) so I unstack and sort the timestamps (and finally reset the index to get rid of a useless index generated along the way).

The resulting index can be used to reindex as in option 2 originally:

Data.reindex(index=index)
like image 71
IanS Avatar answered Sep 22 '22 04:09

IanS