Suppose I have the following series: <pre class="prettyprint"><code>import pandas as pd index1 = pd.IntervalIndex.from_tuples([(1, 3), (2.5, 4), (6, 7)]) x = pd.Series([1, 2, 3], index=index1) index2 = pd.IntervalIndex.from_tuples([(1, 2), (5, 6.5)]) y = pd.Series([10, 20], index=index2) z = x+y </code></pre> Ideally, this is how I would like z to look: <pre class="prettyprint"><code>(1.0, 2.0] 11 (2.0, 2.5] 1 (2.5, 3.0] 3 (3.0, 4.0] 2 (4.0, 5.0] 0 (5.0, 6.0] 20 (6.0, 6.5] 23 (6.5, 7.0] 3 </code></pre> Of course, when I do add them, I get a bunch of <code>NaN</code>s, because the indices don't align. Should I upsample, and then add? (Also... is there a fancy way to downsample in pandas?) How would I deal with one of the series having overlapping intervals inside its own index? <h3>Context:</h3> I'm trying to keep track of the number of students who have a class going on at a certain time. I've scraped the class schedule, and I'm running into a problem when classes start and get out at different times.

flatten all left and right values of <code>x.index</code> and <code>y.index</code> and filter by unique values. Construct new IntervalIndex from these unique values. Using listcomp on the new IntervalIndex checking and slicing on overlaps of <code>x</code>, <code>y</code> to construct final output <pre class="prettyprint"><code>arr = np.unique(x.index.append(y.index).to_tuples().to_numpy().sum()) Out: array([1. , 2. , 2.5, 3. , 4. , 5. , 6. , 6.5, 7. ]) iix = pd.IntervalIndex.from_breaks(arr) s = pd.Series([x[x.index.overlaps(ix)].sum() + y[y.index.overlaps(ix)].sum() for ix in iix], index=iix) Out[379]: (1.0, 2.0] 11 (2.0, 2.5] 1 (2.5, 3.0] 3 (3.0, 4.0] 2 (4.0, 5.0] 0 (5.0, 6.0] 20 (6.0, 6.5] 23 (6.5, 7.0] 3 dtype: int64 </code></pre>

Add two series indexed by distinct but overlapping intervals

Suppose I have the following series:

import pandas as pd

index1 = pd.IntervalIndex.from_tuples([(1, 3), (2.5, 4), (6, 7)])
x = pd.Series([1, 2, 3], index=index1)

index2 = pd.IntervalIndex.from_tuples([(1, 2), (5, 6.5)])
y = pd.Series([10, 20], index=index2)

z = x+y

Ideally, this is how I would like z to look:

(1.0, 2.0]    11
(2.0, 2.5]    1
(2.5, 3.0]    3
(3.0, 4.0]    2
(4.0, 5.0]    0
(5.0, 6.0]    20
(6.0, 6.5]    23
(6.5, 7.0]    3

Of course, when I do add them, I get a bunch of NaNs, because the indices don't align.

Should I upsample, and then add? (Also... is there a fancy way to downsample in pandas?)

How would I deal with one of the series having overlapping intervals inside its own index?

Context:

I'm trying to keep track of the number of students who have a class going on at a certain time.

I've scraped the class schedule, and I'm running into a problem when classes start and get out at different times.

How do you insert a non overlapping interval?

Insert Interval You are given an array of non-overlapping intervals intervals where intervals [i] = [start i, end i] represent the start and the end of the i th interval and intervals is sorted in ascending order by start i. You are also given an interval newInterval = [start, end] that represents the start and end of another interval.

How to append two series if there are no duplicate indexes?

Conversely, if you want to append two series only if there are no duplicate indexes you can pass verify_integrity=True to the append function. The above code resulted in an error because the series s1 and s2 had overlapping indexes.

How to merge intervals in a time series?

Given a set of time intervals in any order, merge all overlapping intervals into one and output the result which should have only mutually exclusive intervals. Let the intervals be represented as pairs of integers for simplicity.

How to check if the index value in a series is unique?

We can also check whether the index value in a Series is unique or not by using the is_unique () method in Pandas which will return our answer in Boolean (either True or False ). If all values are unique then the output will return True, if values are identical then the output will return False. For example:

This is my approach, hope it's self-explained:

# gather x and y and separate start and end time
df = (pd.concat((x,y))
        .to_frame(name='val')
        .assign(start=lambda x: x.index.left,
                end=lambda x: x.index.right)
     )

# unique time point
idx = (df.index.left.to_series()
    .append(df.index.right.to_series())
    .drop_duplicates()
    .to_frame(name='pt')
    .assign(dummy=1)
)

# cross join, query the valid entries, and sum:
(df.assign(dummy=1)
   .merge(idx, on='dummy')
   .query('start < pt <= end')
   .groupby('pt')
   .val
   .sum()
)

Output (note that pt is the end point for each interval, the start point is the previous end point).

pt
2.0    11
2.5     1
3.0     3
4.0     2
6.0    20
6.5    23
7.0     3
Name: val, dtype: int64

flatten all left and right values of x.index and y.index and filter by unique values. Construct new IntervalIndex from these unique values. Using listcomp on the new IntervalIndex checking and slicing on overlaps of x, y to construct final output

arr = np.unique(x.index.append(y.index).to_tuples().to_numpy().sum())

Out:
array([1. , 2. , 2.5, 3. , 4. , 5. , 6. , 6.5, 7. ])

iix = pd.IntervalIndex.from_breaks(arr)
s = pd.Series([x[x.index.overlaps(ix)].sum() + y[y.index.overlaps(ix)].sum() 
                            for ix in iix], index=iix)

Out[379]:
(1.0, 2.0]    11
(2.0, 2.5]     1
(2.5, 3.0]     3
(3.0, 4.0]     2
(4.0, 5.0]     0
(5.0, 6.0]    20
(6.0, 6.5]    23
(6.5, 7.0]     3
dtype: int64

Add two series indexed by distinct but overlapping intervals

Tags:

python

pandas

Context:

Logan Schelly

People also ask

2 Answers

Quang Hoang

Andy L.

Recent Activity

Donate For Us

Add two series indexed by distinct but overlapping intervals

Tags:

python

pandas

Context:

Logan Schelly

People also ask

2 Answers

Quang Hoang

Andy L.

Related questions

Recent Activity

Donate For Us