Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of regex matches

Tags:

python

regex

I'm using the finditer function in the re module to match some things and everything is working.

Now I need to find out how many matches I've got. Is it possible without looping through the iterator twice? (one to find out the count and then the real iteration)

Some code:

imageMatches = re.finditer("<img src\=\"(?P<path>[-/\w\.]+)\"", response[2])
# <Here I need to get the number of matches>
for imageMatch in imageMatches:
    doStuff

Everything works, I just need to get the number of matches before the loop.

like image 734
dutt Avatar asked Oct 09 '10 03:10

dutt


People also ask

How do you find the number of matches in a regular expression?

To count the number of regex matches, call the match() method on the string, passing it the regular expression as a parameter, e.g. (str. match(/[a-z]/g) || []). length . The match method returns an array of the regex matches or null if there are no matches found.

How do you count the number of matches in Python?

To count a regex pattern multiple times in a given string, use the method len(re. findall(pattern, string)) that returns the number of matching substrings or len([*re. finditer(pattern, text)]) that unpacks all matching substrings into a list and returns the length of it as well.

Which regex matches one or more digits?

Occurrence Indicators (or Repetition Operators): +: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.

What is regex for numbers?

Definition and Usage. The [0-9] expression is used to find any character between the brackets. The digits inside the brackets can be any numbers or span of numbers from 0 to 9. Tip: Use the [^0-9] expression to find any character that is NOT a digit.


5 Answers

If you know you will want all the matches, you could use the re.findall function. It will return a list of all the matches. Then you can just do len(result) for the number of matches.

like image 108
JoshD Avatar answered Oct 06 '22 04:10

JoshD


If you always need to know the length, and you just need the content of the match rather than the other info, you might as well use re.findall. Otherwise, if you only need the length sometimes, you can use e.g.

matches = re.finditer(...)
...
matches = tuple(matches)

to store the iteration of the matches in a reusable tuple. Then just do len(matches).

Another option, if you just need to know the total count after doing whatever with the match objects, is to use

matches = enumerate(re.finditer(...))

which will return an (index, match) pair for each of the original matches. So then you can just store the first element of each tuple in some variable.

But if you need the length first of all, and you need match objects as opposed to just the strings, you should just do

matches = tuple(re.finditer(...))
like image 33
intuited Avatar answered Oct 06 '22 03:10

intuited


#An example for counting matched groups
import re

pattern = re.compile(r'(\w+).(\d+).(\w+).(\w+)', re.IGNORECASE)
search_str = "My 11 Char String"

res = re.match(pattern, search_str)
print(len(res.groups())) # len = 4  
print (res.group(1) ) #My
print (res.group(2) ) #11
print (res.group(3) ) #Char
print (res.group(4) ) #String
like image 30
Mods Vs Rockers Avatar answered Oct 06 '22 03:10

Mods Vs Rockers


If you find you need to stick with finditer(), you can simply use a counter while you iterate through the iterator.

Example:

>>> from re import *
>>> pattern = compile(r'.ython')
>>> string = 'i like python jython and dython (whatever that is)'
>>> iterator = finditer(pattern, string)
>>> count = 0
>>> for match in iterator:
        count +=1
>>> count
3

If you need the features of finditer() (not matching to overlapping instances), use this method.

like image 44
Rafe Kettler Avatar answered Oct 06 '22 05:10

Rafe Kettler


I know this is a little old, but this but here is a concise function for counting regex patterns.

def regex_cnt(string, pattern):
    return len(re.findall(pattern, string))

string = 'abc123'

regex_cnt(string, '[0-9]')
like image 38
Travis Jones Avatar answered Oct 06 '22 04:10

Travis Jones