Data structure for range query

Q: What is a range query?

A range query is a common database operation that retrieves all records where some value is between an upper and lower boundary. For example, list all employees with 3 to 5 years' experience.

Q: What is range search tree?

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions.

Q: What is range query problem?

In computer science, a range minimum query (RMQ) solves the problem of finding the minimal value in a sub-array of an array of comparable objects. Range minimum queries have several use cases in computer science, such as the lowest common ancestor problem and the longest common prefix problem (LCP).

Tags:

algorithm

data-structures

I was recently asked a coding question on the below problem. I have some solution to this problem but I am not very sure if those are most efficient.

Problem:

Write a program to track set of text ranges. Start point and end point will be string.

Text range example : [AbA-Ef]
 Aa would fall before this range
 AB would fall inside this range
 etc.

String comparison would be like 'A' < 'a' < 'B' < 'b' ... 'Z' < 'z'

We need to support following operations on this range

Add range - this should merge the ranges if applicable
Delete range - this deletes range from tracked ranges and recompute the ranges
Query range - Given a character, function should return whether it is part of any of tracked ranges or not.

Note that tracked ranges can be dis-continuous.

My solutions:

I came up with two approaches.

Store ranges as doubly linked list or
Store ranges as some sort of balanced tree with leaf node having actual data and they are inter-connected as linked list.

Do you think that this solution are good enough or you can think of any better way of doing this so that those three API gives your best performance ?

244

asked Oct 04 '12 04:10

Lance Reynolds

2 Answers

You are probably looking for an interval tree.

Use the data structure with your custom comparator to indicate "What's on range", and you will be able to do the required operations efficiently.

Note, an interval tree is actually an efficient way to implement your 2nd idea (Store ranges as a some sort of balanced tree)

186

answered Oct 01 '22 13:10

amit

I'm not clear on what the "delete range" operation is supposed to do. Does it;

Delete a previously inserted range, and recompute the merge of the remaining ranges?
Stop tracking the deleted range, regardless of how many times parts of it have been added.

That doesn't make a huge difference algorithmically; it's just bookkeeping. But it's important to clarify. Also, are the ranges closed or half-open? (Another detail which doesn't affect the algorithm but does affect the implementation).

The basic approach to this problem is to merge the tracked set into a sorted list of disjoint (non-overlapping) ranges; either as a vector or a binary search tree, or basically any structure which supports O(log n) searching.

One approach is to put both endpoints of every disjoint range into the datastructure. To find out if a target value is in a range, find the index of the smallest endpoint greater than the target. If the index is odd the target is in some range; even means it's outside.

Alternatively, index all the disjoint ranges by their start points; find the target by searching for the largest start-point not greater than the target, and then compare the target with the associated end-point.

I usually use the first approach with sorted vectors, which are plausible if (a) space utilization is important and (b) insert and merge are relatively rare. With binary search trees, I go for the second approach. But they differ only in details and constants.

Merging and deleting are not difficult, but there are an annoying number of cases. You start by finding the ranges corresponding to the endpoints of the range to be inserted/deleted (using the standard find operation), remove all the ranges in between the two, and fiddle with the endpoints to correct the partially overlapping ranges. While the find operation is always O(log n), the tree/vector manipulation is o(n) (if the inserted/deleted range is large, anyway).

answered Oct 01 '22 13:10

rici

Related questions
                            
                                remove elements from link list whose sum equals to zero
                            
                                How to improve the performance of Leetcode 4sum-ii challenge
                            
                                Aggregation of array data over a given dimension
                            
                                Hungarian algorithm: multiple jobs per worker
                            
                                When (not how or why) to calculate Big O of an algorithm
                            
                                How to find the "center" of a subset of vertices in a graph?
                            
                                How a marker-based augmented reality algorithm (like ARToolkit's one) works?
                            
                                Algorithm to "transfer water from a set of bottles to another one" (metaphorically speaking)
                            
                                Solving a Linear Diophantine Equation(see description for examples)
                            
                                Is there a hashing algorithm that is tolerant of minor differences?
                            
                                Cache oblivious lookahead array
                            
                                What is the more efficient algorithm to equalize a vector?
                            
                                Algorithm for the game of Chomp
                            
                                Natural Language Processing in PHP
                            
                                Efficient algorithm for finding a common divisor closest to some value?
                            
                                Data structure for efficiently retrieving the nearest element from a set
                            
                                Given a flat file of IP Ranges and mappings, find a city given an IP
                            
                                find lowest index of a given value in a presorted array
                            
                                What is better: Select vs Threads?
                            
                                Convert string to palindrome string with minimum insertions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With