Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Can I Detect Gaps and Consecutive Periods In A Time Series In Pandas

Tags:

python

pandas

I have a pandas Dataframe that is indexed by Date. I would like to select all consecutive gaps by period and all consecutive days by Period. How can I do this?

Example of Dataframe with No Columns but a Date Index:

In [29]: import pandas as pd

In [30]: dates = pd.to_datetime(['2016-09-19 10:23:03', '2016-08-03 10:53:39','2016-09-05 11:11:30', '2016-09-05 11:10:46','2016-09-05 10:53:39'])

In [31]: ts = pd.DataFrame(index=dates)

As you can see there is a gap from 2016-08-03 and 2016-09-19. How do I detect these so I can create descriptive statistics, i.e. 40 gaps, with median gap duration of "x", etc. Also, I can see that 2016-09-05 and 2016-09-06 is a two day range. How I can detect these and also print descriptive stats?

Ideally the result would be returned as another Dataframe in each case since I want use other columns in the Dataframe to groupby.

like image 237
Noah Gift Avatar asked Oct 18 '16 21:10

Noah Gift


People also ask

How do you fill gaps in a time series?

A powerful approach to filling gaps in time series is Optimal Interpolation. This method is also known as Kriging. The advantage of this approach is that it provides a smoothed response based on the characteristics of the surrounding data and the known structure of the errors.

How do you find the range of a data set in pandas?

In pandas, we can determine Period Range with Frequency with the help of period_range(). pandas.

What functionality is provided for time series in pandas?

pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python libraries like scikits.

How do you slice series pandas?

Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.


1 Answers

Pandas version 1.0.1 has a built-in method DataFrame.diff() which you can use to accomplish this. One benefit is you can use pandas series functions like mean() to quickly compute summary statistics on the gaps series object

from datetime import datetime, timedelta
import pandas as pd

# Construct dummy dataframe
dates = pd.to_datetime([
    '2016-08-03',
    '2016-08-04',
    '2016-08-05',
    '2016-08-17',
    '2016-09-05',
    '2016-09-06',
    '2016-09-07',
    '2016-09-19'])
df = pd.DataFrame(dates, columns=['date'])

# Take the diff of the first column (drop 1st row since it's undefined)
deltas = df['date'].diff()[1:]

# Filter diffs (here days > 1, but could be seconds, hours, etc)
gaps = deltas[deltas > timedelta(days=1)]

# Print results
print(f'{len(gaps)} gaps with average gap duration: {gaps.mean()}')
for i, g in gaps.iteritems():
    gap_start = df['date'][i - 1]
    print(f'Start: {datetime.strftime(gap_start, "%Y-%m-%d")} | '
          f'Duration: {str(g.to_pytimedelta())}')
like image 167
Addison Klinke Avatar answered Sep 28 '22 08:09

Addison Klinke