I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe:
>>df ctime openbid 1443654000 1.11700 1443654060 1.11700 ... df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = df.set_index('ctime') df.resample('1H', how='ohlc', axis=0, fill_method='bfill') >>> open high low close ctime 2015-09-30 23:00:00 1.11700 1.11700 1.11687 1.11697 2015-09-30 24:00:00 1.11700 1.11712 1.11697 1.11697 ...
But what do I do if the data is already in an OHLC format? From what I can gather the OHLC method of the API calculates an OHLC slice for every column, hence if my data is in the format:
ctime openbid highbid lowbid closebid 0 1443654000 1.11700 1.11700 1.11687 1.11697 1 1443654060 1.11700 1.11712 1.11697 1.11697 2 1443654120 1.11701 1.11708 1.11699 1.11708
When I try to re-sample I get an OHLC for each of the columns, like so:
openbid highbid \ open high low close open high ctime 2015-09-30 23:00:00 1.11700 1.11700 1.11700 1.11700 1.11700 1.11712 2015-09-30 23:01:00 1.11701 1.11701 1.11701 1.11701 1.11708 1.11708 ... lowbid \ low close open high low close ctime 2015-09-30 23:00:00 1.11700 1.11712 1.11687 1.11697 1.11687 1.11697 2015-09-30 23:01:00 1.11708 1.11708 1.11699 1.11699 1.11699 1.11699 ... closebid open high low close ctime 2015-09-30 23:00:00 1.11697 1.11697 1.11697 1.11697 2015-09-30 23:01:00 1.11708 1.11708 1.11708 1.11708
Is there a quick(ish) workaround for this that someone is willing to share please, without me having to get knee-deep in pandas manual?
Thanks.
ps, there is this answer - Converting OHLC stock data into a different timeframe with python and pandas - but it was 4 years ago, so I am hoping there has been some progress.
Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.
Pandas DataFrame aggregate() MethodThe aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.
To view the data in the Pandas DataFrame previously loaded, select the Data Viewer icon to the left of the data variable.
This is similar to the answer you linked, but it a little cleaner, and faster, because it uses the optimized aggregations, rather than lambdas.
Note that the resample(...).agg(...)
syntax requires pandas version 0.18.0
.
In [101]: df.resample('1H').agg({'openbid': 'first', 'highbid': 'max', 'lowbid': 'min', 'closebid': 'last'}) Out[101]: lowbid highbid closebid openbid ctime 2015-09-30 23:00:00 1.11687 1.11712 1.11708 1.117
You need to use an OrderedDict to keep row order in the newer versions of pandas, like so:
import pandas as pd from collections import OrderedDict df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = df.set_index('ctime') df = df.resample('5Min').agg( OrderedDict([ ('open', 'first'), ('high', 'max'), ('low', 'min'), ('close', 'last'), ('volume', 'sum'), ]) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With