Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference pandas.DateTimeIndex without a frequency

An irregular time series data is stored in a pandas.DataFrame. A DatetimeIndex has been set. I need the time difference between consecutive entries in the index.

I thought it would be as simple as

data.index.diff()

but got

AttributeError: 'DatetimeIndex' object has no attribute 'diff'

I tried

data.index - data.index.shift(1)

but got

ValueError: Cannot shift with no freq

I do not want to infer or enforce a frequency first before doing this operation. There are large gaps in the time series that would be expanded to large runs of nan. The point is to find these gaps first.

So, what is a clean way to do this seemingly simple operation?

like image 669
clstaudt Avatar asked Mar 14 '18 12:03

clstaudt


People also ask

What is DatetimeIndex pandas?

class pandas. DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

What is freq in date_range?

date_range() is one of the general functions in Pandas which is used to return a fixed frequency DatetimeIndex. Syntax: pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) Parameters: start : Left bound for generating dates.


1 Answers

There is no implemented diff function yet for index.

However, it is possible to convert the index to a Series first by using Index.to_series, if you need to preserve the original index. Use the Series constructor with no index parameter if the default index is needed.

Code example:

rng = pd.to_datetime(['2015-01-10','2015-01-12','2015-01-13'])
data = pd.DataFrame({'a': range(3)}, index=rng)  
print(data)
             a
 2015-01-10  0
 2015-01-12  1
 2015-01-13  2

a = data.index.to_series().diff()
print(a)

2015-01-10      NaT
2015-01-12   2 days
2015-01-13   1 days
dtype: timedelta64[ns]

a = pd.Series(data.index).diff()
print(a)
 0      NaT
 1   2 days
 2   1 days
dtype: timedelta64[ns]

like image 143
jezrael Avatar answered Oct 04 '22 07:10

jezrael