From the documentation, it's very clear that:
match()
-> apply pattern match at the beginning of the stringsearch()
-> search through the string and return first matchAnd search
with '^'
and without re.M
flag would work the same as match
.
Then why does python have match()
? Isn't it redundant?
Are there any performance benefits to keeping match()
in python?
The pos
argument behaves differently in important ways:
>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None
match
makes it possible to write a tokenizer, and ensure that characters are never skipped. search
has no way of saying "start from the earliest allowable character".
Example use of match to break up a string with no gaps:
def tokenize(s, patt):
at = 0
while at < len(s):
m = patt.match(s, pos=at)
if not m:
raise ValueError("Did not expect character at location {}".format(at))
at = m.end()
yield m
"Why" questions are hard to answer. As a matter of fact, you could define the function re.match()
like this:
def match(pattern, string, flags):
return re.search(r"\A(?:" + pattern + ")", string, flags)
(because \A
always matches at the start of the string, regardless of the re.M
flag status´).
So re.match
is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches()
which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).
It's different for the match
and search
methods of regex objects, though, as Eric has pointed out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With