I'm trying to process CSV files like this:
df = pd.read_csv("raw_hl.csv", index_col='time', parse_dates = True))
df.head(2)
high low
time
2014-01-01 17:00:00 1.376235 1.375945
2014-01-01 17:01:00 1.376005 1.375775
2014-01-01 17:02:00 1.375795 1.375445
2014-01-01 17:07:00 NaN NaN
...
2014-01-01 17:49:00 1.375645 1.375445
type(df.index)
pandas.tseries.index.DatetimeIndex
But these don't automatically have a frequency:
print df.index.freq
None
In case they have differing frequencies, it would be handy to be able to set one automatically. The simplest way would be to compare the first two rows:
tdelta = df.index[1] - df.index[0]
tdelta
datetime.timedelta(0, 60)
So far so good, but setting frequency directly to this timedelta fails:
df.index.freq = tdelta
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-3f24abacf9de> in <module>()
----> 1 df.index.freq = tdelta
AttributeError: can't set attribute
Is there a way (ideally relatively painless!) to do this?
ANSWER: Pandas has given the dataframe has a index.inferred_freq attribute - perhaps to avoid overwriting a user defined frequency. df.index.inferred_freq = 'T'
So it just seems to be a matter of using this instead of df.index.freq. Thanks to Jeff, who also provides more details below :)
If you have a regular frequency it will be reported when you look at df.index.freq
In [20]: df = DataFrame({'A' : np.arange(5)},index=pd.date_range('20130101 09:00:00',freq='3T',periods=5))
In [21]: df
Out[21]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:06:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
In [22]: df.index.freq
Out[22]: <3 * Minutes>
Have an irregular frequency will return None
In [23]: df.index = df.index[0:2].tolist() + [Timestamp('20130101 09:05:00')] + df.index[-2:].tolist()
In [24]: df
Out[24]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:05:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
In [25]: df.index.freq
You can recover a regular frequency by doing this. Downsampling to a lower freq (where you don't have overlapping values), forward filling, then reindexing to the desired frequency and end-points).
In [31]: df.resample('T').ffill().reindex(pd.date_range(df.index[0],df.index[-1],freq='3T'))
Out[31]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:06:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With