Consider the following:
>>> import re
>>> a = "first:second"
>>> re.findall("[^:]*", a)
['first', '', 'second', '']
>>> re.sub("[^:]*", r"(\g<0>)", a)
'(first):(second)'
re.sub()
's behavior makes more sense initially, but I can also understand re.findall()
's behavior. After all, you can match an empty string between first
and :
that consists only of non-colon characters (exactly zero of them), but why isn't re.sub()
behaving the same way?
Shouldn't the result of the last command be (first)():(second)()
?
The re.It searches from start or end of the given string. If we use method findall to search for a pattern in a given string it will return all occurrences of the pattern. While searching a pattern, it is recommended to use re. findall() always, it works like re.search() and re.
Here you can see that, search() method is able to find a pattern from any position of the string. The re. findall() helps to get a list of all matching patterns. It searches from start or end of the given string.
The findall() function scans the string from left to right and finds all the matches of the pattern in the string . The result of the findall() function depends on the pattern: If the pattern has no capturing groups, the findall() function returns a list of strings that match the whole pattern.
How Does the findall() Method Work in Python? The re. findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern . It returns a list of strings in the matching order when scanning the string from left to right.
You use the * which allows empty matches:
'first' -> matched
':' -> not in the character class but, as the pattern can be empty due
to the *, an empty string is matched -->''
'second' -> matched
'$' -> can contain an empty string before,
an empty string is matched -->''
Quoting the documentation for re.findall()
:
Empty matches are included in the result unless they touch the beginning of another match.
The reason you don't see empty matches in sub results is explained in the documentation for re.sub()
:
Empty matches for the pattern are replaced only when not adjacent to a previous match.
Try this:
re.sub('(?:Choucroute garnie)*', '#', 'ornithorynque')
And now this:
print re.sub('(?:nithorynque)*', '#', 'ornithorynque')
There is no consecutive #
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With