I have a long sequence, and I would like to know how often some sub-sequences occur in this sequence.
I know string.count(s, sub), but it only counts non-overlapping sequences.
Does a similar function which also counts overlapping sequences exist?
As an alternative to writing your own search function, you could use the re
module:
In [22]: import re
In [23]: haystack = 'abababa baba alibababa'
In [24]: needle = 'baba'
In [25]: matches = re.finditer(r'(?=(%s))' % re.escape(needle), haystack)
In [26]: print [m.start(1) for m in matches]
[1, 3, 8, 16, 18]
The above prints out the starting positions of all (potentially overlapping) matches.
If all you need is the count, the following should do the trick:
In [27]: len(re.findall(r'(?=(%s))' % re.escape(needle), haystack))
Out[27]: 5
A simple to understand way to do it is:
def count(sub, string):
count = 0
for i in xrange(len(string)):
if string[i:].startswith(sub):
count += 1
return count
count('baba', 'abababa baba alibababa')
#output: 5
If you like short snippets, you can make it less readable but smarter:
def count(subs, s):
return sum((s[i:].startswith(subs) for i in xrange(len(s))))
This uses the fact that Python can treat boolean like integers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With