Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between re.match, re.search, re.fullmatch [duplicate]

Tags:

python

regex

From the regex docs it says that:

Pattern.match(...)

If zero or more characters at the beginning of string match this regular expression

Pattern.fullmatch(...)

If the whole string matches this regular expression

Pattern.search(...)

Scan through string looking for the first location where this regular expression produces a match

Given the above, why couldn't someone just always use search to do everything? For example:

re.search(r'...'   # search
re.search(r'^...'  or re.search(r'\A...'   # match
re.search(r'^...$' or re.search(r'\A...\Z' # fullmatch

Are match and fullmatch just shortcuts (if they could be called that) for the search method? Or do they have other uses that I'm overlooking?

like image 330
samuelbrody1249 Avatar asked Nov 08 '19 21:11

samuelbrody1249


People also ask

What is the difference between re match and re search?

re.search searches for the pattern throughout the string, whereas re. match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

What is the difference between match and search in regular expression?

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

What is re match?

match() function. When provided with a regular expression, the re. match() function checks the string to be matched for a pattern in the RegEx and returns the first occurrence of such a pattern match. This function only checks for a match at the beginning of the string.

What is returned from re search?

The re.search() function will search the regular expression pattern and return the first occurrence. Unlike Python re. match(), it will check all lines of the input string. If the pattern is found, the match object will be returned, otherwise “null” is returned.


2 Answers

Giving credit for @Ruzihm's answer since parts of my answer derive from his.


Quick overview

A quick rundown of the differences:

  • re.match is anchored at the start ^pattern
    • Ensures the string begins with the pattern
  • re.fullmatch is anchored at the start and end of the pattern ^pattern$
    • Ensures the full string matches the pattern (can be especially useful with alternations as described here)
  • re.search is not anchored pattern
    • Ensures the string contains the pattern

A more in-depth comparison of re.match vs re.search can be found here


With examples:

aa            # string
a|aa          # regex

re.match:     a
re.search:    a
re.fullmatch: aa

 

ab            # string
^a            # regex

re.match:     a
re.search:    a
re.fullmatch: # None (no match)

So what about \A and \Z anchors?

The documentation states the following:

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).

And in the Pattern.fullmatch section it says:

If the whole string matches this regular expression, return a corresponding match object.

And, as initially found and quoted by Ruzihm in his answer:

Note however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with ^ will match at the beginning of each line.

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<re.Match object; span=(4, 5), match='X'>
\A^A
B
X$\Z

# re.match('X', s)                  no match
# re.search('^X', s)                no match

# ------------------------------------------
# and the string above when re.MULTILINE is enabled effectively becomes

\A^A$
^B$
^C$\Z

# re.match('X', s, re.MULTILINE)    no match
# re.search('^X', s, re.MULTILINE)  match X

With regards to \A and \Z, neither performs differently for re.MULTILINE since \A and \Z are effectively the only ^ and $ in the whole string.

So using \A and \Z with any of the three methods yields the same results.


Answer (line anchors vs string anchors)

What this tells me is that re.match and re.fullmatch don't match line anchors ^ and $ respectively, but that they instead match string anchors \A and \Z respectively.

like image 176
ctwheels Avatar answered Sep 17 '22 14:09

ctwheels


Yes, they can be seen as shortcuts of re.search calls that start with \A or start with \A and end with \Z.

Because \A always specifies the beginning of the string, using re.search and prepending \A seems to equate re.match, even under MULTILINE mode. Some examples:

import re
haystack = "A\nB\nZ"

matchstring = 'A'
x=re.match(matchstring, haystack) # Match
y=re.search('\A' + matchstring, haystack) # Match

matchstring = 'A$\nB'
x=re.match(matchstring, haystack, re.MULTILINE) # Match
y=re.search('\A' + matchstring, haystack, re.MULTILINE) # Match

matchstring = 'A\n$B'
x=re.match(matchstring, haystack, re.MULTILINE) # No match
y=re.search('\A' + matchstring, haystack, re.MULTILINE) # No match

The same is true for putting the search string between \A and \Z to equate fullmatch.


Not including \A / \Z:

No, they treat MULTILINE differently. From the documentation:

Note however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with '^' will match at the beginning of each line.

...

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<re.Match object; span=(4, 5), match='X'>

Likewise, in MULTILINE mode, fullmatch() matches at the beginning and end of the string, and search() with '^...$' matches at the beginning and end of each line.


like image 27
Ruzihm Avatar answered Sep 20 '22 14:09

Ruzihm