Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find out how many times a regex matches in a string in Python

Tags:

python

regex

Is there a way that I can find out how many matches of a regex are in a string in Python? For example, if I have the string "It actually happened when it acted out of turn."

I want to know how many times "t a" appears in the string. In that string, "t a" appears twice. I want my function to tell me it appeared twice. Is this possible?

like image 553
Dan Avatar asked Sep 03 '09 16:09

Dan


People also ask

How do you find how many occurrences of a regex pattern were replaced in a string?

To count a regex pattern multiple times in a given string, use the method len(re. findall(pattern, string)) that returns the number of matching substrings or len([*re. finditer(pattern, text)]) that unpacks all matching substrings into a list and returns the length of it as well.

How do you check if a string matches a regex pattern in Python?

Method : Using join regex + loop + re.match() This task can be performed using combination of above functions. In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.

How do I find out how many times a string is in a string Python?

count() Python: Using Strings. The count() method can count the number of occurrences of a substring within a larger string. The Python string method count() searches through a string. It returns a value equal to the number of times a substring appears in the string.


2 Answers

import re len(re.findall(pattern, string_to_search)) 
like image 118
SilentGhost Avatar answered Sep 21 '22 13:09

SilentGhost


The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn and ignoring the resulting string...:

def countnonoverlappingrematches(pattern, thestring):   return re.subn(pattern, '', thestring)[1] 

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+' and thestring being 'aa', would you consider this to be just one match, or three (the first a, the second one, both of them), or...?

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):

def countoverlappingdistinct(pattern, thestring):   total = 0   start = 0   there = re.compile(pattern)   while True:     mo = there.search(thestring, start)     if mo is None: return total     total += 1     start = 1 + mo.start() 

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.

like image 37
Alex Martelli Avatar answered Sep 18 '22 13:09

Alex Martelli