Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intersection of two lists of ranges in Python

A friend of mine passed me over an interview question he recently got and I wasn't very happy with my approach to the solution. The question is as follows:

  • You have two lists.
  • Each list will contain lists of length 2, which represent a range (ie. [3,5] means a range from 3 to 5, inclusive).
  • You need to return the intersection of all ranges between the sets. If I give you [1,5] and [0,2], the result would be [1,2].
  • Within each list, the ranges will always increase and never overlap (i.e. it will be [[0, 2], [5, 10] ... ] never [[0,2], [2,5] ... ])

In general there are no "gotchas" in terms of the ordering or overlapping of the lists.

Example:

a = [[0, 2], [5, 10], [13, 23], [24, 25]]
b = [[1, 5], [8, 12], [15, 18], [20, 24]]

Expected output: [[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]

My lazy solution involved spreading the list of ranges into a list of integers then doing a set intersection, like this:

def get_intersection(x, y):
    x_spread = [item for sublist in [list(range(l[0],l[1]+1)) for l in x] for item in sublist]
    y_spread = [item for sublist in [list(range(l[0],l[1]+1)) for l in y] for item in sublist]
    flat_intersect_list = list(set(x_spread).intersection(y_spread))
...

But I imagine there's a solution that's both readable and more efficient.

Please explain how you would mentally tackle this problem, if you don't mind. A time/space complexity analysis would also be helpful.

Thanks

like image 507
Vishal Avatar asked Nov 01 '16 19:11

Vishal


2 Answers

[[max(first[0], second[0]), min(first[1], second[1])] 
  for first in a for second in b 
  if max(first[0], second[0]) <= min(first[1], second[1])]

A list comprehension which gives the answer: [[1, 2], [5, 5], [8, 10], [15, 18], [20, 23], [24, 24]]

Breaking it down:

[[max(first[0], second[0]), min(first[1], second[1])] 

Maximum of the first term, Min of the 2nd term

for first in a for second in b 

For all combinations of first and second term:

if max(first[0], second[0]) <= min(first[1], second[1])]

Only if the max of the first does not exceed the minimum of the second.


If you need the output compacted, then the following function does that (In O(n^2) time because deletion from a list is O(n), a step we perform O(n) times):

def reverse_compact(lst):
    for index in range(len(lst) - 2,-1,-1):
        if lst[index][1] + 1 >= lst[index + 1][0]:
            lst[index][1] = lst[index + 1][1]
            del lst[index + 1]  # remove compacted entry O(n)*
    return lst

It joins ranges which touch, given they are in-order. It does it in reverse because then we can do this operation in place and delete the compacted entries as we go. If we didn't do it in reverse, deleting other entries would muck with our index.

>>> reverse_compact(comp)
[[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]
  • The compacting function can be reduced further to O(n) by doing a forward in place compaction and copying back the elements, as then each inner step is O(1) (get/set instead of del), but this is less readable:

This runs in O(n) time and space complexity:

def compact(lst):
    next_index = 0  # Keeps track of the last used index in our result
    for index in range(len(lst) - 1):
        if lst[next_index][1] + 1 >= lst[index + 1][0]:
            lst[next_index][1] = lst[index + 1][1]
        else:    
            next_index += 1
            lst[next_index] = lst[index + 1]
    return lst[:next_index + 1]

Using either compactor, the list comprehension is the dominating term here, with time =O(n*m), space = O(m+n), as it compares all possible combinations of the two lists with no early outs. This does not take advantage of the ordered structure of the lists given in the prompt: you could exploit that structure to reduce the time complexity to O(n + m) as they always increase and never overlap, meaning you can do all comparisons in a single pass.


Note there is more than one solution and hopefully you can solve the problem and then iteratively improve upon it.

A 100% correct answer which satisfies all possible inputs is not the goal of an interview question. It is to see how a person thinks and handles challenges, and whether they can reason about a solution.

In fact, if you give me a 100% correct, textbook answer, it's probably because you've seen the question before and you already know the solution... and therefore that question isn't helpful to me as an interviewer. 'Check, can regurgitate solutions found on StackOverflow.' The idea is to watch you solve a problem, not regurgitate a solution.

Too many candidates miss the forest for the trees: Acknowledging shortcomings and suggesting solutions is the right way to go about an answer to an interview questions. You don't have to have a solution, you have to show how you would approach the problem.

Your solution is fine if you can explain it and detail potential issues with using it.

I got my current job by failing to answer an interview question: After spending the majority of my time trying, I explained why my approach didn't work and the second approach I would try given more time, along with potential pitfalls I saw in that approach (and why I opted for my first strategy initially).

like image 200
TemporalWolf Avatar answered Sep 21 '22 07:09

TemporalWolf


OP, I believe this solution works, and it runs in O(m+n) time where m and n are the lengths of the lists. (To be sure, make ranges a linked list so that changing its length runs in constant time.)

def intersections(a,b):
    ranges = []
    i = j = 0
    while i < len(a) and j < len(b):
        a_left, a_right = a[i]
        b_left, b_right = b[j]

        if a_right < b_right:
            i += 1
        else:
            j += 1

        if a_right >= b_left and b_right >= a_left:
            end_pts = sorted([a_left, a_right, b_left, b_right])
            middle = [end_pts[1], end_pts[2]]
            ranges.append(middle)

    ri = 0
    while ri < len(ranges)-1:
        if ranges[ri][1] == ranges[ri+1][0]:
            ranges[ri:ri+2] = [[ranges[ri][0], ranges[ri+1][1]]]

        ri += 1

    return ranges

a = [[0,2], [5,10], [13,23], [24,25]]
b = [[1,5], [8,12], [15,18], [20,24]]
print(intersects(a,b))
# [[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]
like image 28
BallpointBen Avatar answered Sep 21 '22 07:09

BallpointBen