Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Python builtin for determining if an iterable contained a certain sequence?

Tags:

python

For example, something like:

>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False

I know that the in operator can do this for strings:

>>> "12" in "123"
True

But I'm looking for something that operates on iterables.

like image 680
David Wolever Avatar asked Jun 21 '12 03:06

David Wolever


People also ask

How do you test iterable?

The only reliable way to determine whether an object is iterable is to call iter(obj) . From "Fluent Python" by Luciano Ramalho: As of Python 3.4, the most accurate way to check whether an object x is iterable is to call iter(x) and handle a TypeError exception if it isn't.

Are all sequences are iterable in Python?

Many things in Python are iterables, but not all of them are sequences. An iterator is an object representing a stream of data. It does the iterating over an iterable. You can use an iterator to get the next value or to loop over it.

Is a sequence An iterable?

Iterables can be looped over, and anything that can be looped over is an iterable. Sequences are a very common type of iterable. Lists, tuples, and strings are all sequences. Sequences are iterables that have a specific set of features.


6 Answers

Referenced from https://stackoverflow.com/a/6822773/24718 modified to use a list.

from itertools import islice

def window(seq, n=2):
    """
    Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   
    """
    it = iter(seq)
    result = list(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))            

test_iterable = [1,2,3]
search_sequence = [1,2]

result = contains_sequence(test_iterable, search_sequence)
like image 167
monkut Avatar answered Nov 09 '22 22:11

monkut


Is there a Python builtin? No. You can accomplish this task in various ways. Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence:

def _search(forward, source, target, start=0, end=None):
    """Naive search for target in source."""
    m = len(source)
    n = len(target)
    if end is None:
        end = m
    else:
        end = min(end, m)
    if n == 0 or (end-start) < n:
        # target is empty, or longer than source, so obviously can't be found.
        return None
    if forward:
        x = range(start, end-n+1)
    else:
        x = range(end-n, start-1, -1)
    for i in x:
        if source[i:i+n] == target:
            return i
    return None
like image 30
BrenBarn Avatar answered Nov 09 '22 21:11

BrenBarn


As far as I know, there's no way to do this. You can roll your own function pretty easily, but I doubt that will be terribly efficient.

>>> def contains_seq(seq,subseq):
...     #try: junk=seq[:]
...     #except: seq=tuple(seq)
...     #try: junk=subseq[:]
...     #except: subseq=tuple(subseq)
...     ll=len(subseq)
...     for i in range(len(seq)-ll):  #on python2, use xrange.
...         if(seq[i:i+ll] == subseq):
...             return True
...     return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False

Note that this solution does not work with generator type objects (it only works on objects that you can slice). You could check seq to see if it is sliceable before proceeding and cast to a tuple if it isn't sliceable -- But then you get rid of the benefits of slicing. You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more.

like image 38
mgilson Avatar answered Nov 09 '22 21:11

mgilson


If preserving of order is not necessary, you can use sets (builtin):

>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False

Otherwise:

def is_subsequence(sub, iterable):
    sub_pos, sub_len = 0, len(sub)
    for i in iterable:
        if i == sub[sub_pos]:
            sub_pos += 1
            if sub_pos >= sub_len:
                return True
        else:
            sub_pos = 0
    return False

>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False

This one works with any iterator.

like image 38
Aleksei astynax Pirogov Avatar answered Nov 09 '22 23:11

Aleksei astynax Pirogov


As others have said, there's no builtin for this. Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested.

def contains_seq(iterable, seq):
    """
    Returns true if the iterable contains the given sequence.
    """
    # The following clause is optional -- leave it if you want to allow `seq` to
    # be an arbitrary iterable; or remove it if `seq` will always be list-like.
    if not isinstance(seq, collections.Sequence):
        seq = tuple(seq)

    if len(seq)==0: return True # corner case

    partial_matches = []
    for elt in iterable:
        # Try extending each of the partial matches by adding the
        # next element, if it matches.
        partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
        # Check if we should start a new partial match
        if elt==seq[0]:
            partial_matches.append(1)
        # Check if we have a complete match (partial_matches will always
        # be sorted from highest to lowest, since older partial matches 
        # come before newer ones).
        if partial_matches and partial_matches[0]==len(seq):
            return True
    # No match found.
    return False
like image 32
Edward Loper Avatar answered Nov 09 '22 22:11

Edward Loper


deque appears to be useful here:

from collections import deque

def contains(it, seq):
    seq = deque(seq)
    deq = deque(maxlen=len(seq))
    for p in it:
        deq.append(p)
        if deq == seq:
            return True
    return False

Note that this accepts arbitrary iterables for both arguments (no slicing required).

like image 24
georg Avatar answered Nov 09 '22 21:11

georg