Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - merging start/end time ranges with short gaps

Tags:

python

pandas

Say I have a series of start and end times for a given event:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(1,5,30).cumsum().reshape(-1, 2), columns = ["start", "end"])

    start  end
0       2    6
1       7    8
2      12   14
3      18   20
4      24   25
5      26   28
6      29   33
7      35   36
8      39   41
9      44   45
10     48   50
11     53   54
12     58   59
13     62   63
14     65   68

I'd like to merge time ranges with a gap less than or equal to n, so for n = 1 the result would be:

fn(df, n = 1)

    start  end
0       2    8
2      12   14
3      18   20
4      24   33
7      35   36
8      39   41
9      44   45
10     48   50
11     53   54
12     58   59
13     62   63
14     65   68

I can't seem to find a way to do this with pandas without iterating and building up the result line-by-line. Is there some simpler way to do this?

like image 396
Daniel F Avatar asked Jan 24 '23 07:01

Daniel F


1 Answers

You can subtract shifted values, compare by N for mask, create groups by cumulative sum and pass to groupby for aggregate max and min:

N = 1
g = df['start'].sub(df['end'].shift())

df = df.groupby(g.gt(N).cumsum()).agg({'start':'min', 'end':'max'})
print (df)
    start  end
1       2    8
2      12   14
3      18   20
4      24   33
5      35   36
6      39   41
7      44   45
8      48   50
9      53   54
10     58   59
11     62   63
12     65   68
like image 184
jezrael Avatar answered Jan 26 '23 21:01

jezrael