The Python docs for findall()
and finditer()
state that:
Empty matches are included in the result unless they touch the beginning of another match
This can be demonstrated as follows:
In [20]: [m.span() for m in re.finditer('.*', 'test')]
Out[20]: [(0, 4), (4, 4)]
Can anyone tell me though, why this pattern returns an empty match in the first place? Shouldn't .*
consume the entire string and return a single match? And further, why is there no empty match at the end if I anchor the pattern to the beginning of the string? e.g.
In [22]: [m.span() for m in re.finditer('^.*', 'test')]
Out[22]: [(0, 4)]
But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.
Introduction to the Python regex finditer function The finditer() function matches a pattern in a string and returns an iterator that yields the Match objects of all non-overlapping matches. In this syntax: pattern is regular expression that you want to search for in the string. string is the input string.
findall() method returns a list of strings. Each string element is a matching substring of the string argument.
The findall () function scans the string from left to right and finds all the matches of the pattern in the string. The result of the findall () function depends on the pattern: If the pattern has no capturing groups, the findall () function returns a list of strings that match the whole pattern.
The expression re.finditer () returns an iterator yielding MatchObject instances over all non-overlapping matches for the re pattern in the string.
finditer method The re.finditer () works exactly the same as the re.findall () method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left-to-right, and matches are returned in the iterator form. Later, we can use this iterator object to extract all matches.
The pattern is a continuous occurrence of alphabets. We will find all the non-overlapping matches of this pattern in the string using re.findall () function. We shall print the list returned by findall () function.
.*
is zero or more, so once the four characters are consumed, the zero-length empty string at the end (which doesn't touch the start of any match) still remains; andIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With