Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why have re.match()?

Tags:

python

regex

I know this topic has already been discussed multiple times here on StackOverflow, but I'm looking for a better answer.

While I appreciate the differences, I was not really able to find a definitive explanation of why the re module in python provides both match() and search(). Couldn't I get the same behavior with search(), if I prepend ^ in single line mode, and /A in multiline mode? Am I missing anything?

I tried to understand the implementation looking at the _sre.c code and I understand that the search (sre_search()) is actually implemented moving the pointer in the string to be searched, and applying the sre_match() on it, until a match is found.

So I guess that using the re.match() might be slightly faster than the corresponding regular expression (with ^ or /A) using the re.search(). Is that the reason?

I also researched the python-dev ML archives but to no avail.

>>> string="""first line
... second line"""
>>> print re.match('first', string, re.MULTILINE)
<_sre.SRE_Match object at 0x1072ae7e8>
>>> print re.match('second', string, re.MULTILINE)
None
>>> print re.search('\Afirst', string, re.MULTILINE)
<_sre.SRE_Match object at 0x1072ae7e8>
>>> print re.search('\Asecond', string, re.MULTILINE)
None
like image 585
spider Avatar asked Mar 12 '15 10:03

spider


People also ask

What is the function of re match ()?

Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none.

What does re match () return?

The match method returns a corresponding match object instance if zero or more characters at the beginning of the string match the regular expression pattern. In simple words, the re. match returns a match object only if the pattern is located at the beginning of the string; otherwise, it will return None.

Why do we need re compile?

The re. compile() method We can combine a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it.

What is difference between match () and search () function?

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).


1 Answers

As you already know, re.match will test the pattern only at the start of the string and re.search will test all the string until it find a match.

So, is there a difference between re.match('toto', s) and re.search('^toto', s) and what it is?

Lets make a little test:

#!/usr/bin/python

import time
import re

p1 = re.compile(r'toto')
p2 = re.compile(r'^toto')

ssize = 1000

s1 = 'toto abcdefghijklmnopqrstuvwxyz012356789'*ssize
s2 = 'titi abcdefghijklmnopqrstuvwxyz012356789'*ssize

nb = 1000

i = 0
t0 = time.time()
while i < nb:
    p1.match(s1)
    i += 1
t1 = time.time()

i = 0
t2 = time.time()
while i < nb:
    p2.search(s1)
    i += 1
t3 = time.time()

print "\nsucceed\nmatch:"
print (t1-t0)
print "search:"
print (t3-t2)


i = 0
t0 = time.time()
while i < nb:
    p1.match(s2)
    i += 1
t1 = time.time()

i = 0
t2 = time.time()
while i < nb:
    p2.search(s2)
    i += 1
t3 = time.time()

print "\nfail\nmatch:"
print (t1-t0)
print "search:"
print (t3-t2)

The two ways are tested with a string that doesn't match and a string that matches.

results:

succeed
match:
0.000469207763672
search:
0.000494003295898

fail
match:
0.000430107116699
search:
0.46605682373

What can we conclude with these results:

1) The performances are similar when the pattern succeeds

2) The performances are totally different when the pattern fails. This is the most important point because, it means that re.search continues to test each positions of the string even if the pattern is anchored when re.match stops immediatly.

If you increase the size of the failing test string, you will see that re.match doesn't take more time but re.search depends of the string size.

like image 147
Casimir et Hippolyte Avatar answered Sep 18 '22 13:09

Casimir et Hippolyte