Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to determine if a sequence is in another sequence?

This is a generalization of the "string contains substring" problem to (more) arbitrary types.

Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:

Example usage (Sequence in Sequence):

>>> seq_in_seq([5,6],  [4,'a',3,5,6]) 3 >>> seq_in_seq([5,7],  [4,'a',3,5,6]) -1 # or None, or whatever 

So far, I just rely on brute force and it seems slow, ugly, and clumsy.

like image 458
Gregg Lind Avatar asked Jan 08 '09 19:01

Gregg Lind


People also ask

How do you check if a list contains a sequence in Python?

In python, the in operator can be used to check the presence of a value in a sequence(string, list, tuple, set, dictionary). It returns a boolean value 'True' if the item is found in the sequence else it returns False. Let us consider a list my_list = ["bat", 3, "apple", "dog", 4.8 ].


2 Answers

I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/

It finds all the correct subsequences in a given sequence, and should be used as an iterator:

>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s 3 >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s (nothing) 
like image 80
Federico A. Ramponi Avatar answered Sep 24 '22 15:09

Federico A. Ramponi


Here's a brute-force approach O(n*m) (similar to @mcella's answer). It might be faster than the Knuth-Morris-Pratt algorithm implementation in pure Python O(n+m) (see @Gregg Lind answer) for small input sequences.

#!/usr/bin/env python def index(subseq, seq):     """Return an index of `subseq`uence in the `seq`uence.      Or `-1` if `subseq` is not a subsequence of the `seq`.      The time complexity of the algorithm is O(n*m), where          n, m = len(seq), len(subseq)      >>> index([1,2], range(5))     1     >>> index(range(1, 6), range(5))     -1     >>> index(range(5), range(5))     0     >>> index([1,2], [0, 1, 0, 1, 2])     3     """     i, n, m = -1, len(seq), len(subseq)     try:         while True:             i = seq.index(subseq[0], i + 1, n - m + 1)             if subseq == seq[i:i + m]:                return i     except ValueError:         return -1  if __name__ == '__main__':     import doctest; doctest.testmod() 

I wonder how large is the small in this case?

like image 31
jfs Avatar answered Sep 21 '22 15:09

jfs