I am doing some signal analysis, a part of which is to find the longest subsequence
I have dictionary like the following:
sequenceDict = {
0: [168, 360, 470],
1: [279, 361, 471, 633, 729, 817],
2: [32, 168, 170, 350, 634, 730, 818],
3: [33, 155, 171, 363, 635, 731, 765, 819],
4: [352, 364, 732, 766, 822],
5: [157, 173, 353, 577, 637, 733, 823, 969],
6: [158, 174, 578, 638, 706, 734, 824],
7: [159, 175, 579, 707, 735],
8: [160, 464, 640, 708, 826],
9: [173, 709, 757, 827],
10: [174, 540, 642, 666, 710],
11: [253, 667, 711],
12: [254, 304, 668],
13: [181, 255, 831],
14: [256, 340, 646, 832],
16: [184, 416],
17: [417],
18: [418],
19: [875],
20: [876],
23: [217],
24: [168, 218, 880],
25: [219, 765, 881],
26: [220, 766],
27: [221],
28: [768],
29: [3, 769],
30: [344, 476, 706]}
These are essentially always sorted indices of another array, I would like to find the longest incrementing sequence ( just like a longest increasing subsequence), by picking only one number from each key sequentially ( key 2 comes right after key 1 and so on), for example, from keys 0 and 1, [360, 361] is one sequence, and [470, 471] is another. I am calling these incrementing sequence, as these numbers should strictly increase by 1.
I have looked at things like patience sorting and so on, but since this problem is slightly different, and also has a tree of sequences, are there any known python implementations, or other efficient ways to do this except generating all possible sequences from this dict and then running patience sort?
I would just implement a "brute-force" solution...
Python provides set
that may be a reasonable choice... this is a sample implementation:
best = None
current_sequences = set()
last_key = None
for key in sorted(sequenceDict.keys()):
data = set(sequenceDict[key])
new_sequences = set()
if last_key == key-1:
# no gap in key value, may be some sequence got extended
for val, count in current_sequences:
if val+1 in data:
# found a continuation, keep this sequence
new_sequences.add((val+1, count+1))
data.remove(val+1)
if best is None or count+1 > best[0]:
# we've got a new champion
best = count+1, val+1, key
# add new sequences starting here
for v in data:
new_sequences.add((v, 1))
if best is None:
best = 1, v, key
current_sequences = new_sequences
last_key = key
One tricky part is that if there is a gap in the keys then you cannot extend a sequence, this is what last_key
is used for.
Complexity should be O(input_size × average_number_of_sequences)
. Mine is just a gut feeling, but my guess is that you cannot get lower than that. I was tempted by the idea of using value - key
to associate a constant value with each sequence... however this would not detect "gaps" (i.e. value 100 in key 1 and value 102 in key 3, but without 101 in key 2).
With the question input the solution is (7, 735, 7)
meaning a 7-elements sequence ending with value 735 at key 7.
In contrast to @6502's solution, this one not only keeping the best solution but also keeps track of every incrementing subsequences, if this is more helpful.
The idea is a similar to the sliding window approach. You start from the first list, update currentHotItems
and globalHotItems
dictionaries and then see the second list and update the dictionaries again, etc.
# fill missing indexes in the dictionary:
for i in range(min(sequenceDict), max(sequenceDict)):
if i not in sequenceDict:
sequenceDict[i] = []
# get only lists, ordered:
sortedItems = map(lambda x:x[1], sorted(sequenceDict.items(), key=lambda x:x[0]))
globalHotItems = {} # (value, startIndex): length
currentHotItems = {} # value: length
for i in range(len(sortedItems)):
updatedHotItems = {} # updated value: length
for item in sortedItems[i]:
if (item - 1) in currentHotItems:
updatedHotItems[item] = currentHotItems[item-1] + 1
else:
updatedHotItems[item] = 1
deadSet = set(currentHotItems.keys()) - \
set(updatedHotItems.keys() + [key - 1 for key in updatedHotItems.keys()])
for item in deadSet:
globalHotItems[ (item-currentHotItems[item]+1, i-currentHotItems[item]) ] = currentHotItems[item]
currentHotItems = updatedHotItems
print sorted(globalHotItems.items(), key=lambda x:x[1])[-1]
globalHotItems
is the dictionary which contains the result. Keys are (value, startIndex) and Value's are the length.
For example, the last 4 items in globalHotItems
:
print sorted(globalHotItems.items(), key=lambda x:x[1])[-4:]
is:
[((157, 5), 4), ((217, 23), 5), ((706, 6), 6), ((729, 1), 7)]
it means the best solution is length 7 and starts in the index=1
list as 729. And the best second solution is length 6 and start in index=6
list as 706, etc.
Complexity:
I think complexity should be again: O(input_size × average_number_of_sequences)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With