Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to find longest incrementing subsequence in a list of lists

I am doing some signal analysis, a part of which is to find the longest subsequence

I have dictionary like the following:

sequenceDict = {
    0: [168, 360, 470],
    1: [279, 361, 471, 633, 729, 817],
    2: [32, 168, 170, 350, 634, 730, 818],
    3: [33, 155, 171, 363, 635, 731, 765, 819],
    4: [352, 364, 732, 766, 822],
    5: [157, 173, 353, 577, 637, 733, 823, 969],
    6: [158, 174, 578, 638, 706, 734, 824],
    7: [159, 175, 579, 707, 735],
    8: [160, 464, 640, 708, 826],
    9: [173, 709, 757, 827],
    10: [174, 540, 642, 666, 710],
    11: [253, 667, 711],
    12: [254, 304, 668],
    13: [181, 255, 831],
    14: [256, 340, 646, 832],
    16: [184, 416], 
    17: [417], 
    18: [418], 
    19: [875], 
    20: [876], 
    23: [217], 
    24: [168, 218, 880], 
    25: [219, 765, 881], 
    26: [220, 766], 
    27: [221], 
    28: [768], 
    29: [3, 769], 
    30: [344, 476, 706]}

These are essentially always sorted indices of another array, I would like to find the longest incrementing sequence ( just like a longest increasing subsequence), by picking only one number from each key sequentially ( key 2 comes right after key 1 and so on), for example, from keys 0 and 1, [360, 361] is one sequence, and [470, 471] is another. I am calling these incrementing sequence, as these numbers should strictly increase by 1.

I have looked at things like patience sorting and so on, but since this problem is slightly different, and also has a tree of sequences, are there any known python implementations, or other efficient ways to do this except generating all possible sequences from this dict and then running patience sort?

like image 321
Sahil M Avatar asked Mar 14 '23 20:03

Sahil M


2 Answers

I would just implement a "brute-force" solution...

  1. keep a list of "current sequences", initially empty
  2. for each key check if any of the current sequences can be extended by one step. When increasing a sequence update also a best-so-far solution.
  3. for any number that was not used to extend a sequence start a new sequence of length 1

Python provides set that may be a reasonable choice... this is a sample implementation:

best = None
current_sequences = set()
last_key = None
for key in sorted(sequenceDict.keys()):
    data = set(sequenceDict[key])
    new_sequences = set()
    if last_key == key-1:
        # no gap in key value, may be some sequence got extended
        for val, count in current_sequences:
            if val+1 in data:
                # found a continuation, keep this sequence
                new_sequences.add((val+1, count+1))
                data.remove(val+1)
                if best is None or count+1 > best[0]:
                    # we've got a new champion
                    best = count+1, val+1, key
    # add new sequences starting here
    for v in data:
        new_sequences.add((v, 1))
        if best is None:
            best = 1, v, key
    current_sequences = new_sequences
    last_key = key

One tricky part is that if there is a gap in the keys then you cannot extend a sequence, this is what last_key is used for.

Complexity should be O(input_size × average_number_of_sequences). Mine is just a gut feeling, but my guess is that you cannot get lower than that. I was tempted by the idea of using value - key to associate a constant value with each sequence... however this would not detect "gaps" (i.e. value 100 in key 1 and value 102 in key 3, but without 101 in key 2).

With the question input the solution is (7, 735, 7) meaning a 7-elements sequence ending with value 735 at key 7.

like image 137
6502 Avatar answered Apr 08 '23 12:04

6502


In contrast to @6502's solution, this one not only keeping the best solution but also keeps track of every incrementing subsequences, if this is more helpful.

The idea is a similar to the sliding window approach. You start from the first list, update currentHotItems and globalHotItems dictionaries and then see the second list and update the dictionaries again, etc.

# fill missing indexes in the dictionary:
for i in range(min(sequenceDict), max(sequenceDict)):
    if i not in sequenceDict:
        sequenceDict[i] = []

# get only lists, ordered:
sortedItems = map(lambda x:x[1], sorted(sequenceDict.items(), key=lambda x:x[0]))    
globalHotItems = {} # (value, startIndex): length
currentHotItems = {} # value: length

for i in range(len(sortedItems)):
    updatedHotItems = {} # updated value: length
    for item in sortedItems[i]:
        if (item - 1) in currentHotItems:
            updatedHotItems[item] = currentHotItems[item-1] + 1
        else:
            updatedHotItems[item] = 1

    deadSet = set(currentHotItems.keys()) - \
            set(updatedHotItems.keys() + [key - 1 for key in updatedHotItems.keys()])

    for item in deadSet:
        globalHotItems[ (item-currentHotItems[item]+1, i-currentHotItems[item]) ] = currentHotItems[item]

    currentHotItems = updatedHotItems

print sorted(globalHotItems.items(), key=lambda x:x[1])[-1]

globalHotItems is the dictionary which contains the result. Keys are (value, startIndex) and Value's are the length.

For example, the last 4 items in globalHotItems:

print sorted(globalHotItems.items(), key=lambda x:x[1])[-4:]

is:

[((157, 5), 4), ((217, 23), 5), ((706, 6), 6), ((729, 1), 7)]

it means the best solution is length 7 and starts in the index=1 list as 729. And the best second solution is length 6 and start in index=6 list as 706, etc.

Complexity:

I think complexity should be again: O(input_size × average_number_of_sequences)

like image 43
Sait Avatar answered Apr 08 '23 11:04

Sait