From the Python docs for re.compile()
:
Note The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.
However, in my testing, this assertion doesn't seem to hold up. When timing the following snippets that use the same pattern repeatedly, the compiled version is still substantially faster than the uncompiled one (which should supposedly be cached).
Is there something I am missing here that explains the time difference?
import timeit
setup = """
import re
pattern = "p.a.t.t.e.r.n"
target = "p1a2t3t4e5r6n"
r = re.compile(pattern)
"""
print "compiled:", \
min(timeit.Timer("r.search(target)", setup).repeat(3, 5000000))
print "uncompiled:", \
min(timeit.Timer("re.search(pattern, target)", setup).repeat(3, 5000000))
Results:
compiled: 2.26673030059
uncompiled: 6.15612802627
Regular Expressions, also known as “regex” or “regexp”, are used to match strings of text such as particular characters, words, or patterns of characters. It means that we can match and extract any string pattern from the text with the help of regular expressions.
compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .
compiled regex's being slower than interpreted ones. There is a lot of work that went into making compiled regex's performant. To get a good comparison between Compiled and not compiled you should test the performance of a single compiled Regex and single non-compiled Regex matching N times.
Python: The regex flavor supported by Python's built-in re module. Ruby: The regex flavor built into the Ruby programming language.
Here's the (CPython) implementation of re.search
:
def search(pattern, string, flags=0):
"""Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).search(string)
and here is re.compile
:
def compile(pattern, flags=0):
"Compile a regular expression pattern, returning a pattern object."
return _compile(pattern, flags)
which relies on re._compile
:
def _compile(*key):
# internal: compile pattern
cachekey = (type(key[0]),) + key
p = _cache.get(cachekey) #_cache is a dict.
if p is not None:
return p
pattern, flags = key
if isinstance(pattern, _pattern_type):
if flags:
raise ValueError('Cannot process flags argument with a compiled pattern')
return pattern
if not sre_compile.isstring(pattern):
raise TypeError, "first argument must be string or compiled pattern"
try:
p = sre_compile.compile(pattern, flags)
except error, v:
raise error, v # invalid expression
if len(_cache) >= _MAXCACHE:
_cache.clear()
_cache[cachekey] = p
return p
So you can see that as long as the regex is already in the dictionary, the only extra work involved is the lookup in the dictionary (which involves creating a few temporary tuples, a few extra function calls ...).
Update In the good ole' days (the code copied above), the cache used to be completely invalidated when it got too big. These days, the cache cycles -- dropping the oldest items first. This implementation relies on the ordering of python dictionaries (which was an implementation detail until python3.7). In Cpython before python3.6, this would have dropped an arbitrary value out of the cache (which is arguably still better than invalidating the whole cache)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With