Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding a set of ranges that a number fall in

I have a 200k lines list of number ranges like start_position,stop position. The list includes all kinds of overlaps in addition to nonoverlapping ones.

the list looks like this

  • [3,5]
  • [10,30]
  • [15,25]
  • [5,15]
  • [25,35]
  • ...

I need to find the ranges that a given number fall in. And will repeat it for 100k numbers. For example if 18 is the given number with the list above then the function should return [10,30] [15,25]

I am doing it in a overly complicated way using bisect, can anybody give a clue on how to do it in a faster way.

Thanks

like image 406
svural Avatar asked Aug 12 '13 04:08

svural


People also ask

How to create range of numbers?

The range is the difference between the smallest and highest numbers in a list or set. To find the range, first put all the numbers in order. Then subtract (take away) the lowest number from the highest. The answer gives you the range of the list.

How do you find the range of a list in Python?

Using range() will get the list of even numbers in the range given as input. The parameters for range() are, start is 2, stop is 20, and step is 2, so the values will be incremented by 2 and will give even numbers till stop-2.


3 Answers

It seems a range coverage problem. Since you have a large amount queries to process, you need to give the result as quickly as possible. There is an algorithm relating to this problem, you can take a look at the Segment Tree.

The idea is to build a Segment Tree first based on the intervals given, and then for each query, you can do as fast as log(N), where N is the number of intervals.

To get all possible k intervals, first find the parent node covering all sub intervals with log(n), and then traverse its children to retrieve all k intervals. So the time complexity of retrieving all intervals for a given number is bounded by log(N) + O(k), where k << n.

This algorithm could be deteriorated to as slow as O(N) when all intervals contains the given number. In this case, since the size of output is N, there does not exist any better solution. Because the complexity of the algorithm must be at least at the same order or higher than the output size.

Hope it helps.

like image 51
zs2020 Avatar answered Oct 07 '22 06:10

zs2020


The best way to solve this is for you to construct an Interval tree. (The range tree given by sza is static. Which means after you construct it you can't add/delete nodes. If that is ok with you then his answer will suffice.)

You can learn about interval tree here:

Youtube: https://www.youtube.com/watch?v=q0QOYtSsTg4

Wiki: https://en.wikipedia.org/wiki/Interval_tree

An interval tree is basically a binary tree based on the left value of the range tuple. ([left, right]) What's special about it is that each node in the tree has a value named max. The max value holds the biggest right value of the nodes that are below this node, including the node itself. In other words, the current node will have it's max value set to the biggest right value of the subtree which the current node is the root.

To expand this for your problem, we can add minimum value as well. Where the min value will hold the smallest left value of all the subtree where this node is the root.

Edited screencap from the video: (sorry for the quality)

enter image description here

That means when we query for a number you:

1) add current node to set of answers if number is inside range    
2) go left if number is less than max value of left child.    
3) go right if number is greater than min value of right child.
like image 27
bunnybare Avatar answered Oct 07 '22 06:10

bunnybare


import random as r
rng = range(99)
lefts = [r.choice(rng) for x in range(0, 200000)]
segments = [(x, r.choice(range(x+1, 100))) for x in lefts]

leftList = sorted(segments, key=lambda x:x[0])
rightList = sorted(segments, key=lambda x:x[1])

def returnSegments(item, segments, leftList, rightList):
    for leftN in range(len(leftList)):
        if leftList[leftN][0] > item:
            break
    for rightN in range(len(rightList)):
        if rightList[rightN][1] > item:
            break
    return list(set(leftList[:leftN]) & set(rightList[rightN:]))

def testReturnSegments():
    for item in range(0,100):
        item = item % 100
        returnSegments(item, segments, leftList, rightList)

This code uses 200k segments, and 100 numbers. It ran on my 2.3 GHz i5 macbook in 9.3 seconds, meaning for a full 200k x 100k run, you'd need 2.5 hours. Probably not fast enough, but there may be ways to speed this approach up - I'll think about it.

like image 2
Brionius Avatar answered Oct 07 '22 06:10

Brionius