Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling Minute data

Tags:

python

pandas

I have minute based OHLCV data for the opening range/first hour (9:30-10:30 AM EST). I'm looking to resample this data so I can get one 60-minute value and then calculate the range.

When I call the dataframe.resample() function on the data I get two rows and the initial row starts at 9:00 AM. I'm looking to get only one row which starts at 9:30 AM.

Note: the initial data begins at 9:30.

enter image description here

Edit: Adding code:

# Extract data for regular trading hours (rth) from the 24 hour data set rth = data.between_time(start_time = '09:30:00', end_time = '16:15:00', include_end = False)  # Extract data for extended trading hours (eth) from the 24 hour data set eth = data.between_time(start_time = '16:30:00', end_time = '09:30:00', include_end = False)  # Extract data for initial balance (rth) from the 24 hour data set initial_balance = data.between_time(start_time = '09:30:00', end_time = '10:30:00', include_end =      False) 

Got stuck tried to separate the opening range by individual date and get the Initial Balance

conversion = {'Open' : 'first', 'High' : 'max', 'Low' : 'min', 'Close' : 'last', 'Volume' : 'sum'} sample = data.between_time(start_time = '09:30:00', end_time = '10:30:00', include_end = False) sample = sample.ix['2007-05-07'] sample.tail()  sample.resample('60Min', how = conversion)  

By default resample starts at the beggining of the hour. I would like it to start from where the data starts.

like image 246
aozkan Avatar asked Feb 13 '13 19:02

aozkan


People also ask

What is resampling time series data?

Resample time-series data. Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index ( DatetimeIndex , PeriodIndex , or TimedeltaIndex ), or the caller must pass the label of a datetime-like series/index to the on / level keyword parameter.

How do I resample data in pandas?

Resample time-series data. Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index ( DatetimeIndex , PeriodIndex , or TimedeltaIndex ), or the caller must pass the label of a datetime-like series/index to the on / level keyword parameter.

How do you resample hourly data in Python?

Resample Hourly Data to Daily Data To simplify your plot which has a lot of data points due to the hourly records, you can aggregate the data for each day using the . resample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them.


2 Answers

You can use the base argument of resample:

sample.resample('60Min', how=conversion, base=30) 

From the above docs-link:

base : int, default 0
    For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals.
    For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0

like image 106
Andy Hayden Avatar answered Sep 20 '22 15:09

Andy Hayden


value is the column you want to aggregate, resample the dataframe dates by second and aggregate by mean, then drop the nan rows.

data=[('2014-02-24 16:16:47.204000',    1.391424) ,('2014-02-24 16:18:48.296000',    1.048143) ,('2014-02-24 16:19:52.346000',  -0.823974) ,('2014-02-24 16:22:13.665000',   -0.689560) ,('2014-02-24 16:24:13.760000',   -0.323252) ,('2014-02-24 16:26:15.155000',   -1.095331) ,('2014-02-24 16:29:58.235000',   -0.185681)]  df=pd.DataFrame(data,columns=['Date','Value']) df['Date']=pd.to_datetime(df['Date']) minutes=df.resample('1Min',on='Date').mean().dropna()  print(minutes) 

output:

Value Date                          2014-02-24 16:16:00  1.391424 2014-02-24 16:18:00  1.048143 2014-02-24 16:19:00 -0.823974 2014-02-24 16:22:00 -0.689560 2014-02-24 16:24:00 -0.323252 2014-02-24 16:26:00 -1.095331 2014-02-24 16:29:00 -0.185681 
like image 24
Golden Lion Avatar answered Sep 18 '22 15:09

Golden Lion