I have historical trade data in a pandas DataFrame, containing price and volume columns, indexed by a DateTimeIndex.
For example:
>>> print df.tail()
price volume
2014-01-15 14:29:54+00:00 949.975 0.01
2014-01-15 14:29:59+00:00 941.370 0.01
2014-01-15 14:30:17+00:00 949.975 0.01
2014-01-15 14:30:24+00:00 941.370 0.01
2014-01-15 14:30:36+00:00 949.975 0.01
Now, I can resample this into OHLC data using df.resample(freq, how={'price': 'ohlc'})
, which is fine, but I'd also like to include the volume.
When I try df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'})
, I get:
ValueError: Shape of passed values is (2,), indices imply (2, 95)
I'm not quite sure what is wrong with my dataset, or why this fails. Could anyone help shed some light on this? Much appreciated.
The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).
In [17]: df
Out[17]:
price volume
2014-01-15 14:29:54 949.975 0.01
2014-01-15 14:29:59 941.370 0.01
2014-01-15 14:30:17 949.975 0.01
2014-01-15 14:30:24 941.370 0.01
2014-01-15 14:30:36 949.975 0.01
[5 rows x 2 columns]
In [18]: df.resample('30s', how={'price': 'ohlc'}) # Note the MultiIndex
Out[18]:
price
open high low close
2014-01-15 14:29:30 949.975 949.975 941.370 941.370
2014-01-15 14:30:00 949.975 949.975 941.370 941.370
2014-01-15 14:30:30 949.975 949.975 949.975 949.975
[3 rows x 4 columns]
In [19]: df.resample('30s', how={'volume': 'sum'}) # Regular Index for columns
Out[19]:
volume
2014-01-15 14:29:30 0.02
2014-01-15 14:30:00 0.02
2014-01-15 14:30:30 0.01
[3 rows x 1 columns]
I guess you could manually create a MultiIndex for (volume, sum)
and then concat:
In [34]: vol = df.resample('30s', how={'volume': 'sum'})
In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])
In [36]: vol
Out[36]:
volume
sum
2014-01-15 14:29:30 0.02
2014-01-15 14:30:00 0.02
2014-01-15 14:30:30 0.01
[3 rows x 1 columns]
In [37]: price = df.resample('30s', how={'price': 'ohlc'})
In [38]: pd.concat([price, vol], axis=1)
Out[38]:
price volume
open high low close sum
2014-01-15 14:29:30 949.975 949.975 941.370 941.370 0.02
2014-01-15 14:30:00 949.975 949.975 941.370 941.370 0.02
2014-01-15 14:30:30 949.975 949.975 949.975 949.975 0.01
[3 rows x 5 columns]
But it might be better if resample could handle this automatically.
You can now do this in later versions of Pandas
Example: Pandas version 0.22.00 df.resample('30S').mean()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With