Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A range intersection algorithm better than O(n)?

Range intersection is a simple, but non-trivial problem.

Its has been answered twice already:

  • Find number range intersection
  • Comparing date ranges

The first solutions is O(n) and the second solution is for a database (which is less than O(n) of course).

I have the same problem, but for a large n and I am not within a database.

This problem seems to be very similar to Store 2D points for quick retrieval of those inside a rectangle but I don't see how it maps.

So what data structure would you store the set of ranges in, such that a search on a range costs less than O(n)? (Extra credit for using libraries available for Java)

EDIT:

I want to get a subset of all intersecting ranges, meaning the search range could intersect multiple ranges.

The method that needs to be less than O(n) in Java is:

public class RangeSet {
    ....
    public Set<Range> intersects(Range range);
    ....
}

Where Range is just a class containing a pair of int start and end.

This is not an impossible question, I already have the solution, I just wanted to see if there was a more standard/simpler way of doing it

like image 711
Pyrolistical Avatar asked Nov 19 '08 22:11

Pyrolistical


2 Answers

The standard approach is to use an interval tree.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

The trivial solution is to visit each interval and test whether it intersects the given point or interval, which requires O(n) time, where n is the number of intervals in the collection. Since a query may return all intervals, for example if the query is a large interval intersecting all intervals in the collection, this is asymptotically optimal; however, we can do better by considering output-sensitive algorithms, where the runtime is expressed in terms of m, the number of intervals produced by the query. Interval trees have a query time of O(log n + m) and an initial creation time of O(n log n), while limiting memory consumption to O(n). After creation, interval trees may be dynamic, allowing efficient insertion and deletion of an interval in O(log n). If the endpoints of intervals are within a small integer range (e.g., in the range [1,...,O(n)]), faster data structures exist[1] with preprocessing time O(n) and query time O(1+m) for reporting m intervals containing a given query point.

like image 74
Rafał Dowgird Avatar answered Nov 06 '22 18:11

Rafał Dowgird


Non Overlapping Ranges:

Prep O(n log n):

  1. Make a array / vector of the ranges.
  2. Sort the vector by the end of the range (break ties by sorting by the start of the range)

Search:

  1. Use binary search to find the first range with an End value of >= TestRange.Start
  2. Iterator starting at the binary search until you find an Start > TestRange.End:

    2a. If the range if the current range is within the TestRange, add it to your result.

like image 35
Adam Tegen Avatar answered Nov 06 '22 18:11

Adam Tegen