I have a dataframe, where the left
column is the left - most location of an object, and the right
column is the right most location. I need to group the objects if they overlap, or they overlap objects that overlap (recursively).
So, for example, if this is my dataframe:
left right
0 0 4
1 5 8
2 10 13
3 3 7
4 12 19
5 18 23
6 31 35
so lines 0
and 3
overlap - thus they should be on the same group, and also line 1
is overlapping line 3
- thus it joins the group.
So, for this example the output should be something like that:
left right group
0 0 4 0
1 5 8 0
2 10 13 1
3 3 7 0
4 12 19 1
5 18 23 1
6 31 35 2
I thought of various directions, but didn't figure it out (without an ugly for
).
Any help will be appreciated!
I found the accepted solution (update: now deleted) to be misleading because it fails to generalize to similar cases. e.g. for the following example:
df = pd.DataFrame({'left': [0,5,10,3,12,13,18,31],
'right':[4,8,13,7,19,16,23,35]})
df
The suggested aggregate function outputs the following dataframe (note that the 18-23 should be in group 1, along with 12-19).
One solution is using the following approach (based on a method for combining intervals posted by @CentAu):
# Union intervals by @CentAu
from sympy import Interval, Union
def union(data):
""" Union of a list of intervals e.g. [(1,2),(3,4)] """
intervals = [Interval(begin, end) for (begin, end) in data]
u = Union(*intervals)
return [u] if isinstance(u, Interval) \
else list(u.args)
# Create a list of intervals
df['left_right'] = df[['left', 'right']].apply(list, axis=1)
intervals = union(df.left_right)
# Add a group column
df['group'] = df['left'].apply(lambda x: [g for g,l in enumerate(intervals) if
l.contains(x)][0])
...which outputs:
Can you try this, use rolling
max
and rolling
min
, to find the intersection of the range :
df=df.sort_values(['left','right'])
df['Group']=((df.right.rolling(window=2,min_periods=1).min()-df.left.rolling(window=2,min_periods=1).max())<0).cumsum()
df.sort_index()
Out[331]:
left right Group
0 0 4 0
1 5 8 0
2 10 13 1
3 3 7 0
4 12 19 1
5 18 23 1
6 31 35 2
For example , (1,3) and (2,4) To find the intersection
mix(3,4)-max(1,2)=1 ; 1 is more than 0; then two intervals have intersection
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With