Efficient way to find overlapping of N rectangles

Tags:

I am trying to find an efficient solution for finding overlapping of n rectangles where rectangles are stored in two separate lists. We are looking for all rectangles in listA that overlap with rectangles in listB (and vice versa). Comparing one element from the first list to second list could take immensely large amount of time. I am looking for an efficient solution.

I have two list of rectangles

rect = Rectangle(10, 12, 56, 15)
rect2 = Rectangle(0, 0,1, 15)
rect3 = Rectangle (10,  12, 56, 15)

listA = [rect, rect2]
listB = [rect3]

which is created from the class:

import numpy as np
import itertools as it

class  Rectangle(object):
    def __init__(self, left, right, bottom, top):
        self.left = left
        self.bottom = right
        self.right = bottom
        self.top = top

    def overlap(r1, r2):
        hoverlaps = True
        voverlaps = True
        if (r1.left > r2.right) or (r1.right < r2.left):
            hoverlaps = False
        if (r1.top < r2.bottom) or (r1.bottom > r2.top):
            voverlaps = False
        return hoverlaps and voverlaps

I need to compare rectangle in listA to listB the code goes like this which is highly inefficient - comparing one by one

for a in it.combinations(listB):
    for b in it.combinations(listA):
        if a.overlap(b):

Any better efficient method to deal with the problem?

861

asked Nov 16 '16 00:11

karu

2 Answers

First off: As with many a problem from computational geometry, specifying the parameters for order-of-growth analysis needs care: calling the lengths of the lists m and n, the worst case in just those parameters is Ω(m×n), as all areas might overlap (in this regard, the algorithm from the question is asymptotically optimal). It is usual to include the size of the output: t = f(m, n, o) (Output-sensitive algorithm).
Trivially, f ∈ Ω(m+n+o) for the problem presented.

Line Sweep is a paradigm to reduce geometrical problems by one dimension - in its original form, from 2D to 1D, plane to line.

Imagine all the rectangles in the plane, different colours for the lists.
Now sweep a line across this plane - left to right, conventionally, and infinitesimally further to the right "for low y-coordinates" (handle coordinates in increasing x-order, increasing y-order for equal x).
For all of this sweep (or scan), per colour keep one set representing the "y-intervals" of all rectangles at the current x-coordinate, starting empty. (In a data structure supporting insertion, deletion, and enumerating all intervals that overlap a query interval: see below.)
Meeting the left side of a rectangle, add the segment to the data structure for its colour. Report overlapping intervals/rectangles in any other colour.
At a right side, remove the segment.
Depending on the definition of "overlapping", handle left sides before right sides - or the other way round.

There are many data structures supporting insertion and deletion of intervals, and finding all intervals that overlap a query interval. Currently, I think Augmented Search-Trees may be easiest to understand, implement, test, analyse…
Using this, enumerating all o intersecting pairs of axis-aligned rectangles (a, b) from listA and listB should be possible in O((m+n)log(m+n)+o) time and O(m+n) space. For sizeable problem instances, avoid data structures needing more than linear space ((original) Segment Trees, for one example pertaining to interval overlap).

Another paradigm in algorithm design is Divide&Conquer: with a computational geometry problem, choose one dimension in which the problem can be divided into independent parts, and a coordinate such that the sub-problems for "coordinates below" and "coordinates above" are close in expected run-time. Quite possibly, another (and different) sub-problem "including the coordinate" needs to be solved. This tends to be beneficial when a) the run-time for solving sub-problems is "super-log-linear", and b) there is a cheap (linear) way to construct the overall solution from the solutions for the sub-problems.
This lends itself to concurrent problem solving, and can be used with any other approach for sub-problems, including line sweep.

There will be many ways to tweak each approach, starting with disregarding input items that can't possibly contribute to the output. To "fairly" compare implementations of algorithms of like order of growth, don't aim for a fair "level of tweakedness": try to invest fair amounts of time for tweaking.

answered Oct 01 '22 12:10

greybeard

A couple of potential minor efficiency improvements. First, fix your overlap() function, it potentially does calculations it needn't:

def overlap(r1, r2):

    if r1.left > r2.right or r1.right < r2.left:
        return False

    if r1.top < r2.bottom or r1.bottom > r2.top:
        return False

    return True

Second, calculate the contaning rectangle for one of the lists and use it to screen the other list -- any rectangle that doesn't overlap the container doesn't need to be tested against all the rectangles that contributed to it:

def containing_rectangle(rectangles):
    return Rectangle(min(rectangles, key=lambda r: r.left).left,
        max(rectangles, key=lambda r: r.right).right,
        min(rectangles, key=lambda r: r.bottom).bottom,
        max(rectangles, key=lambda r: r.top).top
    )

c = containing_rectangle(listA)

for b in listB:
    if b.overlap(c):
        for a in listA:
            if b.overlap(a):

In my testing with hundreds of random rectangles, this avoided comparisons on the order of single digit percentages (e.g. 2% or 3%) and occasionally increased the number of comparisons. However, presumably your data isn't random and might fare better with this type of screening.

Depending on the nature of your data, you could break this up into a container rectangle check for each batch of 10K rectangles out of 50K or what ever slice gives you maximum efficiency. Possibly presorting the rectangles (e.g. by their centers) before assigning them to container batches.

We can break up and batch both lists with container rectangles:

listAA = [listA[x:x + 10] for x in range(0, len(listA), 10)]

for i, arrays in enumerate(listAA):
    listAA[i] = [containing_rectangle(arrays)] + arrays

listBB = [listB[x:x + 10] for x in range(0, len(listB), 10)]

for i, arrays in enumerate(listBB):
    listBB[i] = [containing_rectangle(arrays)] + arrays

for bb in listBB:
    for aa in listAA:
        if bb[0].overlap(aa[0]):
            for b in bb[1:]:
                if b.overlap(aa[0]):
                    for a in aa[1:]:
                        if b.overlap(a):

With my random data, this decreased the comparisons on the order of 15% to 20%, even counting the container rectangle comparisons. The batching of rectangles above is arbitrary and you can likely do better.

answered Oct 01 '22 12:10

cdlane

Related questions
                            
                                ImportError: No module named mpl_toolkits with maptlotlib 1.3.0 and py2exe
                            
                                pandas plot dataframe barplot with colors by category
                            
                                Transparency for Poly3DCollection plot in matplotlib
                            
                                How to read the last MB of a very large text file
                            
                                Python - How to save functions
                            
                                how to align text to the left?
                            
                                h5py: Correct way to slice array datasets
                            
                                Python programming functional vs. imperative code
                            
                                Beautiful Soup 4: Remove comment tag and its content
                            
                                How to convert all Decimals in a Python data structure to string?
                            
                                ValueError: Incorrect timezone setting while migrating manage.py file in Django
                            
                                Parsing of table from .docx file [closed]
                            
                                Print real roots only in numpy
                            
                                Can I create model in Django without automatic ID?
                            
                                How to convert SQL query results into a python dictionary
                            
                                Getting tweet replies to a particular tweet from a particular user
                            
                                How to filter based on array value in PySpark?
                            
                                Use Unix-based commands with Anaconda in Windows Operating System
                            
                                How do you automate pyspark jobs on emr using boto3 (or otherwise)?
                            
                                Rename variable scope of saved model in TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient way to find overlapping of N rectangles

Tags:

python

algorithm

computational-geometry

karu

People also ask

2 Answers

greybeard

cdlane

Recent Activity

Donate For Us