What is the most efficient algorithm/data structure for finding the smallest range containing a point?

Tags:

hilbert-curve

Given a data set of a few millions of price ranges, we need to find the smallest range that contains a given price.
The following rules apply:

Ranges can be fully nested (ie, 1-10 and 5-10 is valid)
Ranges cannot be partially nested (ie, 1-10 and 5-15 is invalid)

Example:
Given the following price ranges:

1-100
50-100
100-120
5-10
5-20

The result for searching price 7 should be 5-10
The result for searching price 100 should be 100-120 (smallest range containing 100).

What's the most efficient algorithm/data structure to implement this?
Searching the web, I only found solutions for searching ranges within ranges.
I've been looking at Morton count and Hilbert curve, but can't wrap my head around how to use them for this case.
Thanks.

515

asked Aug 13 '18 07:08

user1102018

2 Answers

Because you did not mention this ad hoc algorithm, I'll propose this as a simple answer to your question:

This is a python function, but it's fairly easy to understand and convert it in another language.

def min_range(ranges, value):
    # ranges = [(1, 100), (50, 100), (100, 120), (5, 10), (5, 20)]
    # value = 100

    # INIT
    import math
    best_range = None
    best_range_len = math.inf

    # LOOP THROUGH ALL RANGES
    for b, e in ranges:

        # PICK THE SMALLEST
        if b <= value <= e and e - b < best_range_len:
            best_range = (b, e)
            best_range_len = e - b

    print(f'Minimal range containing {value} = {best_range}')

I believe there are more efficient and complicated solutions (if you can do some precomputation for example) but this is the first step you must take.

EDIT : Here is a better solution, probably in O(log(n)) but it's not trivial. It is a tree where each node is an interval, and has a child list of all strictly non overlapping intervals that are contained inside him. Preprocessing is done in O(n log(n)) time and queries are O(n) in worst case (when you can't find 2 ranges that don't overlap) and probably O(log(n)) in average.

2 classes: Tree that holds the tree and can query:

class tree:
    def __init__(self, ranges):
        # sort the ranges by lowest starting and then greatest ending
        ranges = sorted(ranges, key=lambda i: (i[0], -i[1]))
        # recursive building -> might want to optimize that in python
        self.node = node( (-float('inf'), float('inf')) , ranges)

    def __str__(self):
        return str(self.node)

    def query(self, value):
        # bisect is for binary search
        import bisect
        curr_sol = self.node.inter
        node_list = self.node.child_list

        while True:
            # which of the child ranges can include our value ?
            i = bisect.bisect_left(node_list, (value, float('inf'))) - 1
            # does it includes it ?
            if i < 0 or i == len(node_list):
                return curr_sol
            if value > node_list[i].inter[1]:
                return curr_sol
            else:
                # if it does then go deeper
                curr_sol = node_list[i].inter
                node_list = node_list[i].child_list

Node that holds the structure and information:

class node:
    def __init__(self, inter, ranges):
        # all elements in ranges will be descendant of this node !
        import bisect

        self.inter = inter
        self.child_list = []

        for i, r in enumerate(ranges):
            if len(self.child_list) == 0:
                # append a new child when list is empty
                self.child_list.append(node(r, ranges[i + 1:bisect.bisect_left(ranges, (r[1], r[1] - 1))]))

            else:
                # the current range r is included in a previous range 
                # r is not a child of self but a descendant !
                if r[0] < self.child_list[-1].inter[1]:
                    continue
                # else -> this is a new child
                self.child_list.append(node(r, ranges[i + 1:bisect.bisect_left(ranges, (r[1], r[1] - 1))]))

    def __str__(self):
        # fancy
        return f'{self.inter} : [{", ".join([str(n) for n in self.child_list])}]'

    def __lt__(self, other):
        # this is '<' operator -> for bisect to compare our items
        return self.inter < other

and to test that:

ranges = [(1, 100), (50, 100), (100, 120), (5, 10), (5, 20), (50, 51)]
t = tree(ranges)
print(t)
print(t.query(10))
print(t.query(5))
print(t.query(40))
print(t.query(50))

answered Nov 01 '22 10:11

pLOPeGG

Preprocessing that generates disjoined intervals
(I call source segments as ranges and resulting segments as intervals)

For ever range border (both start and end) make tuple: (value, start/end fiels, range length, id), put them in array/list

Sort these tuples by the first field. In case of tie make longer range left for start and right for end.

Make a stack
Make StartValue variable.
Walk through the list:
     if current tuple contains start:
          if interval is opened:   //we close it
             if  current value > StartValue:   //interval is not empty
                  make interval with   //note id remains in stack
                      (start=StartValue, end = current value, id = stack.peek)       
                  add interval to result list
          StartValue = current value //we open new interval
          push id from current tuple onto stack
     else:   //end of range
             if  current value > StartValue:   //interval is not empty
                 make interval with    //note id is removed from stack
                      (start=StartValue, end = current value, id = stack.pop)
                 add interval to result list
         if stack is not empty:
              StartValue = current value //we open new interval

After that we have sorted list of disjointed intervals containing start/end value and id of the source range (note that many intervals might correspond to the same source range), so we can use binary search easily.

If we add source ranges one-by-one in nested order (nested after it parent), we can see that every new range might generate at most two new intervals, so overall number of intervals M <= 2*N and overall complexity is O(Nlog N + Q * logN) where Q is number of queries

Edit: Added if stack is not empty section

Result for your example 1-100, 50-100, 100-120, 5-10, 5-20 is

1-5(0), 5-10(3), 10-20(4), 20-50(0), 50-100(1), 100-120(2)

answered Nov 01 '22 08:11

MBo

Related questions
                            
                                Good algorithm for finding subsets of point sets
                            
                                Factoring a number into roughly equal factors
                            
                                Is there any larger significance to this piece of translated assembly code?
                            
                                Why my code is much slower than opencv for a simple StereoBM algorithm?
                            
                                Matches overlapping lookahead on LZ77/LZSS with suffix trees
                            
                                Extending Python's os.walk function on FTP server
                            
                                Finding a "movement direction" (angle) of a point
                            
                                Algorithm to find top K paths in graph, with no common vertices, negative weights?
                            
                                Multichannel blind deconvolution in the simplest formulation: how to solve?
                            
                                How to divide by 9 using just shifts/add/sub?
                            
                                Finding edges in a graph with similar lengths
                            
                                What's the good of using 3 states for a vertex in DFS?
                            
                                Finding a irreducible fraction
                            
                                Find the subset of a set of integers that has the maximum product
                            
                                Round Robin scheduling in java
                            
                                How does Array.sort() behave if comparison function is not transitive?
                            
                                Determine the closest free set of X/Y coordinates within an ever expanding 2D grid stored in MySQL
                            
                                How does raft handle committing entries from previous one?
                            
                                Constructing a quadtree such that there is only one level difference between adjacent nodes (LOD)
                            
                                O(n) time smallest spanning window combination of the elements in k sorted arrays

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With