Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add two series indexed by distinct but overlapping intervals

Tags:

python

pandas

Suppose I have the following series:

import pandas as pd

index1 = pd.IntervalIndex.from_tuples([(1, 3), (2.5, 4), (6, 7)])
x = pd.Series([1, 2, 3], index=index1)

index2 = pd.IntervalIndex.from_tuples([(1, 2), (5, 6.5)])
y = pd.Series([10, 20], index=index2)

z = x+y

Ideally, this is how I would like z to look:

(1.0, 2.0]    11
(2.0, 2.5]    1
(2.5, 3.0]    3
(3.0, 4.0]    2
(4.0, 5.0]    0
(5.0, 6.0]    20
(6.0, 6.5]    23
(6.5, 7.0]    3

Of course, when I do add them, I get a bunch of NaNs, because the indices don't align.

Should I upsample, and then add? (Also... is there a fancy way to downsample in pandas?)

How would I deal with one of the series having overlapping intervals inside its own index?

Context:

I'm trying to keep track of the number of students who have a class going on at a certain time.

I've scraped the class schedule, and I'm running into a problem when classes start and get out at different times.

like image 793
Logan Schelly Avatar asked Dec 12 '19 18:12

Logan Schelly


People also ask

How do you insert a non overlapping interval?

Insert Interval You are given an array of non-overlapping intervals intervals where intervals [i] = [start i, end i] represent the start and the end of the i th interval and intervals is sorted in ascending order by start i. You are also given an interval newInterval = [start, end] that represents the start and end of another interval.

How to append two series if there are no duplicate indexes?

Conversely, if you want to append two series only if there are no duplicate indexes you can pass verify_integrity=True to the append function. The above code resulted in an error because the series s1 and s2 had overlapping indexes.

How to merge intervals in a time series?

Given a set of time intervals in any order, merge all overlapping intervals into one and output the result which should have only mutually exclusive intervals. Let the intervals be represented as pairs of integers for simplicity.

How to check if the index value in a series is unique?

We can also check whether the index value in a Series is unique or not by using the is_unique () method in Pandas which will return our answer in Boolean (either True or False ). If all values are unique then the output will return True, if values are identical then the output will return False. For example:


2 Answers

This is my approach, hope it's self-explained:

# gather x and y and separate start and end time
df = (pd.concat((x,y))
        .to_frame(name='val')
        .assign(start=lambda x: x.index.left,
                end=lambda x: x.index.right)
     )

# unique time point
idx = (df.index.left.to_series()
    .append(df.index.right.to_series())
    .drop_duplicates()
    .to_frame(name='pt')
    .assign(dummy=1)
)

# cross join, query the valid entries, and sum:
(df.assign(dummy=1)
   .merge(idx, on='dummy')
   .query('start < pt <= end')
   .groupby('pt')
   .val
   .sum()
)

Output (note that pt is the end point for each interval, the start point is the previous end point).

pt
2.0    11
2.5     1
3.0     3
4.0     2
6.0    20
6.5    23
7.0     3
Name: val, dtype: int64
like image 81
Quang Hoang Avatar answered Oct 26 '22 13:10

Quang Hoang


flatten all left and right values of x.index and y.index and filter by unique values. Construct new IntervalIndex from these unique values. Using listcomp on the new IntervalIndex checking and slicing on overlaps of x, y to construct final output

arr = np.unique(x.index.append(y.index).to_tuples().to_numpy().sum())

Out:
array([1. , 2. , 2.5, 3. , 4. , 5. , 6. , 6.5, 7. ])

iix = pd.IntervalIndex.from_breaks(arr)
s = pd.Series([x[x.index.overlaps(ix)].sum() + y[y.index.overlaps(ix)].sum() 
                            for ix in iix], index=iix)

Out[379]:
(1.0, 2.0]    11
(2.0, 2.5]     1
(2.5, 3.0]     3
(3.0, 4.0]     2
(4.0, 5.0]     0
(5.0, 6.0]    20
(6.0, 6.5]    23
(6.5, 7.0]     3
dtype: int64
like image 20
Andy L. Avatar answered Oct 26 '22 12:10

Andy L.