Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data gap finding (not filling) in pandas?

I have a gappy timeseries stored in a pandas dataframe with a datetimeindex. I now want to identify gaps in the timeseries in order to identify the continuous segments in order to process them individually (and in some cases glue together segments with short enough gaps between them).

There's two main ways I can see to do this. The first is to re-index using various approaches to get a regular timeseries and observe the filled NA values in the gap regions. In my case that leads to lots of additional rows (i.e. some lengthy gaps). You then still need to make an additional step to identify the continuous segments.

The other approach, and what I'm currently using, is to use np.diff to difference the index and find the gaps using np.where. But is there a more native pandas approach to this? This seems like a fairly common task. I note there are issues with np.diff and pandas with some combinations of numpy and pandas versions so a pandas only solution would be preferable.

What would be perfect would be something like

for segment in data.continuous_segments():
    # Process each segment

for the dataframe data.

like image 405
Bogdanovist Avatar asked Oct 20 '22 11:10

Bogdanovist


1 Answers

This might work for you:

df = pd.DataFrame([["2015-01-01",1],["2015-01-02",1],[np.nan,1],[np.nan,1],["2015-01-10",1],["2015-01-11",1]], columns = ['timestamp','value'])

continuous_segments = df[df.timestamp.notnull()].groupby(df.timestamp.isnull().cumsum())

for segment in continuous_segments:
     print (segment[1])

    timestamp  value
0  2015-01-01      1
1  2015-01-02      1
    timestamp  value
4  2015-01-10      1
5  2015-01-11      1
like image 172
maxymoo Avatar answered Oct 22 '22 00:10

maxymoo