Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: how to count overlapping occurrences of a substring [duplicate]

Tags:

python

I wanted to count the number of times that a string like 'aa' appears in 'aaa' (or 'aaaa').

The most obvious code gives the wrong (or at least, not the intuitive) answer:

'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3

Does anyone have a simple way to fix this?

like image 554
nivk Avatar asked Oct 10 '13 17:10

nivk


People also ask

How many times does a substring repeat in a string python?

count() One of the built-in ways in which you can use Python to count the number of occurrences in a string is using the built-in string . count() method. The method takes one argument, either a character or a substring, and returns the number of times that character exists in the string associated with the method.

How do you count repeated substrings in Python?

Python has a built-in function for counting the repeated substring in a given string called count(). As the name suggests, it counts the occurrence of a substring in a given string.

How do you find overlapping Substrings?

In order to solve this problem, we can use find() function in Python. It returns the start position of the first occurrence of substring in the given string, then we increment this position by 1 and continue the search from that position till the end of the string.

How do you count occurrences of a string in Python?

Python String count() Method The count() method searches (case-sensitive) the specified substring in the given string and returns an integer indicating occurrences of the substring. By default, the counting begins from 0 index till the end of the string.


2 Answers

From str.count() documentation:

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

So, no. You are getting the expected result.

If you want to count number of overlapping matches, use regex:

>>> import re
>>> 
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2

This finds all the occurrence of a, which is followed by a. The 2nd a wouldn't be captured, as we've used look-ahead, which is zero-width assertion.

like image 116
Rohit Jain Avatar answered Sep 25 '22 01:09

Rohit Jain


haystack = "aaaa"
needle   = "aa"

matches  = sum(haystack[i:i+len(needle)] == needle 
               for i in xrange(len(haystack)-len(needle)+1))

# for Python 3 use range instead of xrange
like image 20
kindall Avatar answered Sep 25 '22 01:09

kindall