Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use re.match(), when re.search() can do the same thing?

Tags:

python

regex

From the documentation, it's very clear that:

  • match() -> apply pattern match at the beginning of the string
  • search() -> search through the string and return first match

And search with '^' and without re.M flag would work the same as match.

Then why does python have match()? Isn't it redundant? Are there any performance benefits to keeping match() in python?

like image 430
jai.maruthi Avatar asked Apr 22 '15 19:04

jai.maruthi


2 Answers

The pos argument behaves differently in important ways:

>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None

match makes it possible to write a tokenizer, and ensure that characters are never skipped. search has no way of saying "start from the earliest allowable character".

Example use of match to break up a string with no gaps:

def tokenize(s, patt):
    at = 0
    while at < len(s):
        m = patt.match(s, pos=at)
        if not m:
            raise ValueError("Did not expect character at location {}".format(at))
        at = m.end()
        yield m
like image 119
Eric Avatar answered Sep 29 '22 12:09

Eric


"Why" questions are hard to answer. As a matter of fact, you could define the function re.match() like this:

def match(pattern, string, flags):
    return re.search(r"\A(?:" + pattern + ")", string, flags)

(because \A always matches at the start of the string, regardless of the re.M flag status´).

So re.match is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches() which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).

It's different for the match and search methods of regex objects, though, as Eric has pointed out.

like image 34
Tim Pietzcker Avatar answered Sep 29 '22 13:09

Tim Pietzcker