Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count consecutive repetitions of a substring in a string?

Tags:

python

string

I need to find consecutive (non-overlapping) repetitions of a substring in a string. I can count them but not consecutive. For instance:

string = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
substring = "AA"

here, "AA" is repeated one time at the beginning of the string, then 4 times, then 2 times, etc. I should select the biggest one, in this example - 4 times.

How can I do that?

like image 680
inmortable Avatar asked Apr 09 '20 23:04

inmortable


People also ask

How do you count the number of occurrences of a substring in a string?

count() Return Value count() method returns the number of occurrences of the substring in the given string.

How do you find all occurrences of substring in a string?

Use the string. count() Function to Find All Occurrences of a Substring in a String in Python. The string. count() is an in-built function in Python that returns the quantity or number of occurrences of a substring in a given particular string.


2 Answers

Regular expressions shine when searching through strings. Here you can find all groups of one or more AA with (?:AA)+ the (?: simply tells the engine to interpret the parentheses for grouping only.

Once you have the groups you can use max() to find the longest based on length (len()).

import re

s = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"

groups = re.findall(r'(?:AA)+', s)
print(groups)
# ['AA', 'AAAAAAAA', 'AAAA', 'AA']

largest = max(groups, key=len)
print(len(largest) // 2)
# 4
like image 101
Mark Avatar answered Dec 05 '22 04:12

Mark


one way to do it with basic operations is to search for the pattern "AA" in the string and add "AA" to the search until you don't find any more:

string  = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
count   = 0
pattern = "AA"
while pattern in string:
    count += 1
    pattern += "AA"

output:

print(count) # 4

It could also be written on a single line like this:

count = next(r-1 for r in range(1,len(string)+1) if "AA"*r not in string)

You could also use the find() method instead of the in operator which would allow the search to continue from the first match instead of starting over from the beginning of the string:

string  = "AASDASDDAAAAAAAAERQREQREQRAAAAREWQRWERAAA"
pattern = "AA"

repeated = ""
position = 0
while position >= 0:
    repeated += pattern
    position = string.find(repeated,position)
count = len(repeated)//len(pattern)-1  
        
print(count) # 4
like image 38
Alain T. Avatar answered Dec 05 '22 04:12

Alain T.