How can I find the number of overlapping sequences in a String with Python? [duplicate]

Question

I have a long sequence, and I would like to know how often some sub-sequences occur in this sequence.

I know string.count(s, sub), but it only counts non-overlapping sequences.

Does a similar function which also counts overlapping sequences exist?

NPE · Accepted Answer

As an alternative to writing your own search function, you could use the re module:

In [22]: import re

In [23]: haystack = 'abababa baba alibababa'

In [24]: needle = 'baba'

In [25]: matches = re.finditer(r'(?=(%s))' % re.escape(needle), haystack)

In [26]: print [m.start(1) for m in matches]
[1, 3, 8, 16, 18]

The above prints out the starting positions of all (potentially overlapping) matches.

If all you need is the count, the following should do the trick:

In [27]: len(re.findall(r'(?=(%s))' % re.escape(needle), haystack))
Out[27]: 5

e-satis · Answer

A simple to understand way to do it is:

def count(sub, string):
    count = 0
    for i in xrange(len(string)):
        if string[i:].startswith(sub):
            count += 1
    return count

count('baba', 'abababa baba alibababa')
#output: 5

If you like short snippets, you can make it less readable but smarter:

def count(subs, s):
    return sum((s[i:].startswith(subs) for i in xrange(len(s))))

This uses the fact that Python can treat boolean like integers.

How can I find the number of overlapping sequences in a String with Python? [duplicate]

Tags:

python

Martin Thoma

2 Answers

NPE

e-satis

Recent Activity

Donate For Us

How can I find the number of overlapping sequences in a String with Python? [duplicate]

Tags:

python

Martin Thoma

2 Answers

NPE

e-satis

Related questions

Recent Activity

Donate For Us