I am wondering if anyone knows of a data structure which would efficiently handle the following situation:
The data structure should store several, possibly overlapping, variable length ranges on some continuous timescale.
For example, you might add the ranges a:[0,3], b:[4,7], c:[0,9]
.
Insertion time does not need to be particularly efficient.
Retrievals would take a range as a parameter, and return all the ranges in the set that overlap with the range, for example:
Get(1,2)
would return a and c. Get(6,7)
would return b and c. Get(2,6)
would return all three.
Retrievals need to be as efficient as possible.
An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays. Here's an image of a simple array of size 4, containing elements (1, 2, 3 and 4).
In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input.
Data structure is a storage that is used to store and organize data. It is a way of arranging data on a computer so that it can be accessed and updated efficiently. Depending on your requirement and project, it is important to choose the right data structure for your project.
So, if you have only 10000 nodes to store, you should choose a compact representation (if possible) and store this representation directly in an array on the stack.
One data structure you could use is a one-dimensional R-tree. These are designed to deal with ranges and to provide efficient retrieval. You will also learn about Allen's Operators; there are a dozen other relationships between time intervals than just 'overlaps'.
There are other questions on SO that impinge on this area, including:
You could go for a binary tree, that stores the ranges in a hierarchy. Starting from the root node, that represents an all-encompassing range divided it its middle, you test if your range you are trying to insert belong to the left subrange, right subrange, or both, and recursively carry on in the matching subnodes until you reach a certain depth, at which you save the actual range.
For lookup, you test your input range against the left and right subranges of the top node, and dive in the ones which overlap, repeating until you have reached actual ranges that you save.
This way, retrieval has a logarithmic complexity. You'd still need to manage duplicates in your retrieval, as some ranges are going to belong to several nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With