Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Structure for Quickly Checking Numbers against Overlapping Ranges

Let's say I have the following number ranges:

0-500 0-100 75-127 125-157 130-198 198-200

Now, let's say I need to be able to check any given number and see which ranges it is in. What sort of data structure would be used most efficiently to tell that, say, the number 100 belongs to the ranges 0-500, 0-100, and 75-127? Do I just want a binary tree holding the starting values? In that case, would it be feasible for each node in the tree to hold several objects containing each range at that starting point?

Note that I only need to retrieve for this particular application, I don't really see myself needing to modify it mid-process, so retrieval speed is by far my priority.

Thanks!

like image 324
J S Avatar asked Apr 12 '14 05:04

J S


People also ask

How do you check if intervals are overlapping?

1) Sort all intervals in increasing order of start time. This step takes O(nLogn) time. 2) In the sorted array, if start time of an interval is less than end of previous interval, then there is an overlap.

What is interval tree data structure?

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point.

What is range in data structure?

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input.

How do interval trees work?

An interval tree has a leaf node for every elementary interval. On top of these leaves is built a complete binary tree. Each internal node of the tree stores, as its key, the integer that separates the elementary intervals in its left and right subrees.


2 Answers

What you need is interval tree. Interval trees are quite general concept and keep a slightly different data in nodes for every problem. In your case each node would keep a list of input intervals that cover the interval node is representing.

like image 191
Bartosz Marcinkowski Avatar answered Oct 06 '22 00:10

Bartosz Marcinkowski


Let R denote the number of ranges possible. (For your example R=6)

Create R hashtables such that each hash table can hold only one range. For your example you need to create 6 hashtables. The first hash table R1will contain values from 0-500 only.

Populate hashtable.

Each number will go into appropriate hashtables. For your example number 100 will go into R1, R2, R3. If R is large, then you would need to create lot of hashtables. But then the total space is bounded by the actual data stored in all the hashtables put together.

Retrival:

For any given number check if it is present in each of the R hashtables. You can further optimize by choosing which hashtables to look. For example for 100 you only need to look into 3 out of 6 hashtables.

Time complexity:

Searching for a single value in a hash table takes constant time (on average). So Amortized O(1) to see if a number is present in a hashtable.

Amortized O(R) to produce the output as we need to look at all the hashtables to produce the output

like image 33
arunmoezhi Avatar answered Oct 05 '22 23:10

arunmoezhi