Suppose I have the following series:
import pandas as pd
index1 = pd.IntervalIndex.from_tuples([(1, 3), (2.5, 4), (6, 7)])
x = pd.Series([1, 2, 3], index=index1)
index2 = pd.IntervalIndex.from_tuples([(1, 2), (5, 6.5)])
y = pd.Series([10, 20], index=index2)
z = x+y
Ideally, this is how I would like z to look:
(1.0, 2.0] 11
(2.0, 2.5] 1
(2.5, 3.0] 3
(3.0, 4.0] 2
(4.0, 5.0] 0
(5.0, 6.0] 20
(6.0, 6.5] 23
(6.5, 7.0] 3
Of course, when I do add them, I get a bunch of NaN
s, because the indices don't align.
Should I upsample, and then add? (Also... is there a fancy way to downsample in pandas?)
How would I deal with one of the series having overlapping intervals inside its own index?
I'm trying to keep track of the number of students who have a class going on at a certain time.
I've scraped the class schedule, and I'm running into a problem when classes start and get out at different times.
Insert Interval You are given an array of non-overlapping intervals intervals where intervals [i] = [start i, end i] represent the start and the end of the i th interval and intervals is sorted in ascending order by start i. You are also given an interval newInterval = [start, end] that represents the start and end of another interval.
Conversely, if you want to append two series only if there are no duplicate indexes you can pass verify_integrity=True to the append function. The above code resulted in an error because the series s1 and s2 had overlapping indexes.
Given a set of time intervals in any order, merge all overlapping intervals into one and output the result which should have only mutually exclusive intervals. Let the intervals be represented as pairs of integers for simplicity.
We can also check whether the index value in a Series is unique or not by using the is_unique () method in Pandas which will return our answer in Boolean (either True or False ). If all values are unique then the output will return True, if values are identical then the output will return False. For example:
This is my approach, hope it's self-explained:
# gather x and y and separate start and end time
df = (pd.concat((x,y))
.to_frame(name='val')
.assign(start=lambda x: x.index.left,
end=lambda x: x.index.right)
)
# unique time point
idx = (df.index.left.to_series()
.append(df.index.right.to_series())
.drop_duplicates()
.to_frame(name='pt')
.assign(dummy=1)
)
# cross join, query the valid entries, and sum:
(df.assign(dummy=1)
.merge(idx, on='dummy')
.query('start < pt <= end')
.groupby('pt')
.val
.sum()
)
Output (note that pt
is the end point for each interval, the start point is the previous end point).
pt
2.0 11
2.5 1
3.0 3
4.0 2
6.0 20
6.5 23
7.0 3
Name: val, dtype: int64
flatten all left and right values of x.index
and y.index
and filter by unique values. Construct new IntervalIndex from these unique values. Using listcomp on the new IntervalIndex checking and slicing on overlaps of x
, y
to construct final output
arr = np.unique(x.index.append(y.index).to_tuples().to_numpy().sum())
Out:
array([1. , 2. , 2.5, 3. , 4. , 5. , 6. , 6.5, 7. ])
iix = pd.IntervalIndex.from_breaks(arr)
s = pd.Series([x[x.index.overlaps(ix)].sum() + y[y.index.overlaps(ix)].sum()
for ix in iix], index=iix)
Out[379]:
(1.0, 2.0] 11
(2.0, 2.5] 1
(2.5, 3.0] 3
(3.0, 4.0] 2
(4.0, 5.0] 0
(5.0, 6.0] 20
(6.0, 6.5] 23
(6.5, 7.0] 3
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With