Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling trade data into OHLCV with pandas

Tags:

python

pandas

I have historical trade data in a pandas DataFrame, containing price and volume columns, indexed by a DateTimeIndex.

For example:

>>> print df.tail()
                             price  volume
2014-01-15 14:29:54+00:00  949.975    0.01
2014-01-15 14:29:59+00:00  941.370    0.01
2014-01-15 14:30:17+00:00  949.975    0.01
2014-01-15 14:30:24+00:00  941.370    0.01
2014-01-15 14:30:36+00:00  949.975    0.01

Now, I can resample this into OHLC data using df.resample(freq, how={'price': 'ohlc'}), which is fine, but I'd also like to include the volume.

When I try df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'}), I get:

ValueError: Shape of passed values is (2,), indices imply (2, 95)

I'm not quite sure what is wrong with my dataset, or why this fails. Could anyone help shed some light on this? Much appreciated.

like image 311
jespern Avatar asked Jan 15 '14 14:01

jespern


2 Answers

The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).

In [17]: df
Out[17]: 
                       price  volume
2014-01-15 14:29:54  949.975    0.01
2014-01-15 14:29:59  941.370    0.01
2014-01-15 14:30:17  949.975    0.01
2014-01-15 14:30:24  941.370    0.01
2014-01-15 14:30:36  949.975    0.01

[5 rows x 2 columns]

In [18]: df.resample('30s', how={'price': 'ohlc'})  # Note the MultiIndex
Out[18]: 
                       price                           
                        open     high      low    close
2014-01-15 14:29:30  949.975  949.975  941.370  941.370
2014-01-15 14:30:00  949.975  949.975  941.370  941.370
2014-01-15 14:30:30  949.975  949.975  949.975  949.975

[3 rows x 4 columns]

In [19]: df.resample('30s', how={'volume': 'sum'})  # Regular Index for columns
Out[19]: 
                     volume
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

I guess you could manually create a MultiIndex for (volume, sum) and then concat:

In [34]: vol = df.resample('30s', how={'volume': 'sum'})

In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])

In [36]: vol
Out[36]: 
                     volume
                        sum
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

In [37]: price = df.resample('30s', how={'price': 'ohlc'})

In [38]: pd.concat([price, vol], axis=1)
Out[38]: 
                       price                             volume
                        open     high      low    close     sum
2014-01-15 14:29:30  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:00  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:30  949.975  949.975  949.975  949.975    0.01

[3 rows x 5 columns]

But it might be better if resample could handle this automatically.

like image 117
TomAugspurger Avatar answered Nov 01 '22 06:11

TomAugspurger


You can now do this in later versions of Pandas Example: Pandas version 0.22.00 df.resample('30S').mean()

like image 34
campervancoder Avatar answered Nov 01 '22 06:11

campervancoder