Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Resampling error: Only valid with DatetimeIndex or PeriodIndex

Tags:

python

pandas

When using panda's resample function on a DataFrame in order to convert tick data to OHLCV, a resampling error is encountered.

How should we solve the error?

data = pd.read_csv('tickdata.csv', header=None, names=['Timestamp','Price','Volume']).set_index('Timestamp') data.head() 

enter image description here

# Resample data into 30min bins ticks = data.ix[:, ['Price', 'Volume']] bars = ticks.Price.resample('30min', how='ohlc') volumes = ticks.Volume.resample('30min', how='sum') 

This gives the error:

TypeError: Only valid with DatetimeIndex or PeriodIndex 
like image 606
Nyxynyx Avatar asked Jun 16 '15 01:06

Nyxynyx


1 Answers

Convert the integer timestamps in the index to a DatetimeIndex:

data.index = pd.to_datetime(data.index, unit='s') 

This interprets the integers as seconds since the Epoch.


For example, given

data = pd.DataFrame(     {'Timestamp':[1313331280, 1313334917, 1313334917, 1313340309, 1313340309],       'Price': [10.4]*3 + [10.5]*2, 'Volume': [0.779, 0.101, 0.316, 0.150, 1.8]}) data = data.set_index(['Timestamp']) #             Price  Volume # Timestamp                 # 1313331280   10.4   0.779 # 1313334917   10.4   0.101 # 1313334917   10.4   0.316 # 1313340309   10.5   0.150 # 1313340309   10.5   1.800  data.index = pd.to_datetime(data.index, unit='s') 

yields

                     Price  Volume 2011-08-14 14:14:40   10.4   0.779 2011-08-14 15:15:17   10.4   0.101 2011-08-14 15:15:17   10.4   0.316 2011-08-14 16:45:09   10.5   0.150 2011-08-14 16:45:09   10.5   1.800 

Then

ticks = data.ix[:, ['Price', 'Volume']] bars = ticks.Price.resample('30min').ohlc() volumes = ticks.Volume.resample('30min').sum() 

can be computed:

In [368]: bars Out[368]:                       open  high   low  close 2011-08-14 14:00:00  10.4  10.4  10.4   10.4 2011-08-14 14:30:00   NaN   NaN   NaN    NaN 2011-08-14 15:00:00  10.4  10.4  10.4   10.4 2011-08-14 15:30:00   NaN   NaN   NaN    NaN 2011-08-14 16:00:00   NaN   NaN   NaN    NaN 2011-08-14 16:30:00  10.5  10.5  10.5   10.5  In [369]: volumes Out[369]:  2011-08-14 14:00:00    0.779 2011-08-14 14:30:00      NaN 2011-08-14 15:00:00    0.417 2011-08-14 15:30:00      NaN 2011-08-14 16:00:00      NaN 2011-08-14 16:30:00    1.950 Freq: 30T, Name: Volume, dtype: float64 
like image 188
unutbu Avatar answered Sep 22 '22 04:09

unutbu