Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure for range query

I was recently asked a coding question on the below problem. I have some solution to this problem but I am not very sure if those are most efficient.


Problem:

Write a program to track set of text ranges. Start point and end point will be string.

Text range example : [AbA-Ef]
 Aa would fall before this range
 AB would fall inside this range
 etc.

String comparison would be like 'A' < 'a' < 'B' < 'b' ... 'Z' < 'z'

We need to support following operations on this range

  • Add range - this should merge the ranges if applicable
  • Delete range - this deletes range from tracked ranges and recompute the ranges
  • Query range - Given a character, function should return whether it is part of any of tracked ranges or not.

Note that tracked ranges can be dis-continuous.


My solutions:

I came up with two approaches.

  1. Store ranges as doubly linked list or
  2. Store ranges as some sort of balanced tree with leaf node having actual data and they are inter-connected as linked list.

Do you think that this solution are good enough or you can think of any better way of doing this so that those three API gives your best performance ?

like image 244
Lance Reynolds Avatar asked Oct 04 '12 04:10

Lance Reynolds


People also ask

What is a range query?

A range query is a common database operation that retrieves all records where some value is between an upper and lower boundary. For example, list all employees with 3 to 5 years' experience.

What is range search tree?

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions.

What is range query problem?

In computer science, a range minimum query (RMQ) solves the problem of finding the minimal value in a sub-array of an array of comparable objects. Range minimum queries have several use cases in computer science, such as the lowest common ancestor problem and the longest common prefix problem (LCP).


2 Answers

You are probably looking for an interval tree.

Use the data structure with your custom comparator to indicate "What's on range", and you will be able to do the required operations efficiently.

Note, an interval tree is actually an efficient way to implement your 2nd idea (Store ranges as a some sort of balanced tree)

like image 186
amit Avatar answered Oct 01 '22 13:10

amit


I'm not clear on what the "delete range" operation is supposed to do. Does it;

  • Delete a previously inserted range, and recompute the merge of the remaining ranges?

  • Stop tracking the deleted range, regardless of how many times parts of it have been added.

That doesn't make a huge difference algorithmically; it's just bookkeeping. But it's important to clarify. Also, are the ranges closed or half-open? (Another detail which doesn't affect the algorithm but does affect the implementation).

The basic approach to this problem is to merge the tracked set into a sorted list of disjoint (non-overlapping) ranges; either as a vector or a binary search tree, or basically any structure which supports O(log n) searching.

One approach is to put both endpoints of every disjoint range into the datastructure. To find out if a target value is in a range, find the index of the smallest endpoint greater than the target. If the index is odd the target is in some range; even means it's outside.

Alternatively, index all the disjoint ranges by their start points; find the target by searching for the largest start-point not greater than the target, and then compare the target with the associated end-point.

I usually use the first approach with sorted vectors, which are plausible if (a) space utilization is important and (b) insert and merge are relatively rare. With binary search trees, I go for the second approach. But they differ only in details and constants.

Merging and deleting are not difficult, but there are an annoying number of cases. You start by finding the ranges corresponding to the endpoints of the range to be inserted/deleted (using the standard find operation), remove all the ranges in between the two, and fiddle with the endpoints to correct the partially overlapping ranges. While the find operation is always O(log n), the tree/vector manipulation is o(n) (if the inserted/deleted range is large, anyway).

like image 44
rici Avatar answered Oct 01 '22 13:10

rici