I have a gappy timeseries stored in a pandas dataframe with a datetimeindex. I now want to identify gaps in the timeseries in order to identify the continuous segments in order to process them individually (and in some cases glue together segments with short enough gaps between them).
There's two main ways I can see to do this. The first is to re-index using various approaches to get a regular timeseries and observe the filled NA values in the gap regions. In my case that leads to lots of additional rows (i.e. some lengthy gaps). You then still need to make an additional step to identify the continuous segments.
The other approach, and what I'm currently using, is to use np.diff to difference the index and find the gaps using np.where. But is there a more native pandas approach to this? This seems like a fairly common task. I note there are issues with np.diff and pandas with some combinations of numpy and pandas versions so a pandas only solution would be preferable.
What would be perfect would be something like
for segment in data.continuous_segments():
# Process each segment
for the dataframe data.
This might work for you:
df = pd.DataFrame([["2015-01-01",1],["2015-01-02",1],[np.nan,1],[np.nan,1],["2015-01-10",1],["2015-01-11",1]], columns = ['timestamp','value'])
continuous_segments = df[df.timestamp.notnull()].groupby(df.timestamp.isnull().cumsum())
for segment in continuous_segments:
print (segment[1])
timestamp value
0 2015-01-01 1
1 2015-01-02 1
timestamp value
4 2015-01-10 1
5 2015-01-11 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With