Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas OHLC aggregation on OHLC data

I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe:

>>df ctime       openbid 1443654000  1.11700 1443654060  1.11700 ...  df['ctime']  = pd.to_datetime(df['ctime'], unit='s') df           = df.set_index('ctime') df.resample('1H',  how='ohlc', axis=0, fill_method='bfill')   >>>                      open     high     low       close ctime                                                    2015-09-30 23:00:00  1.11700  1.11700  1.11687   1.11697 2015-09-30 24:00:00  1.11700  1.11712  1.11697   1.11697 ... 

But what do I do if the data is already in an OHLC format? From what I can gather the OHLC method of the API calculates an OHLC slice for every column, hence if my data is in the format:

             ctime  openbid  highbid   lowbid  closebid 0       1443654000  1.11700  1.11700  1.11687   1.11697 1       1443654060  1.11700  1.11712  1.11697   1.11697 2       1443654120  1.11701  1.11708  1.11699   1.11708 

When I try to re-sample I get an OHLC for each of the columns, like so:

                     openbid                             highbid           \                         open     high      low    close     open     high    ctime                                                                        2015-09-30 23:00:00  1.11700  1.11700  1.11700  1.11700  1.11700  1.11712    2015-09-30 23:01:00  1.11701  1.11701  1.11701  1.11701  1.11708  1.11708  ...                                         lowbid                             \                          low    close     open     high      low    close    ctime                                                                        2015-09-30 23:00:00  1.11700  1.11712  1.11687  1.11697  1.11687  1.11697    2015-09-30 23:01:00  1.11708  1.11708  1.11699  1.11699  1.11699  1.11699   ...                      closebid                                                      open     high      low    close   ctime                                                     2015-09-30 23:00:00  1.11697  1.11697  1.11697  1.11697   2015-09-30 23:01:00  1.11708  1.11708  1.11708  1.11708   

Is there a quick(ish) workaround for this that someone is willing to share please, without me having to get knee-deep in pandas manual?

Thanks.

ps, there is this answer - Converting OHLC stock data into a different timeframe with python and pandas - but it was 4 years ago, so I am hoping there has been some progress.

like image 493
user3439187 Avatar asked Mar 25 '16 15:03

user3439187


People also ask

How do you aggregate daily data into weekly in Python?

Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.

How do pandas use aggregation?

Pandas DataFrame aggregate() MethodThe aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.

How do you read a DataFrame in VS code?

To view the data in the Pandas DataFrame previously loaded, select the Data Viewer icon to the left of the data variable.


2 Answers

This is similar to the answer you linked, but it a little cleaner, and faster, because it uses the optimized aggregations, rather than lambdas.

Note that the resample(...).agg(...) syntax requires pandas version 0.18.0.

In [101]: df.resample('1H').agg({'openbid': 'first',                                   'highbid': 'max',                                   'lowbid': 'min',                                   'closebid': 'last'}) Out[101]:                        lowbid  highbid  closebid  openbid ctime                                                    2015-09-30 23:00:00  1.11687  1.11712   1.11708    1.117 
like image 73
chrisb Avatar answered Sep 22 '22 04:09

chrisb


You need to use an OrderedDict to keep row order in the newer versions of pandas, like so:

import pandas as pd from collections import OrderedDict  df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = df.set_index('ctime') df = df.resample('5Min').agg(     OrderedDict([         ('open', 'first'),         ('high', 'max'),         ('low', 'min'),         ('close', 'last'),         ('volume', 'sum'),     ]) ) 
like image 35
Benjamin Crouzier Avatar answered Sep 22 '22 04:09

Benjamin Crouzier