Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why should we use re.purge() in python regular expression?

Tags:

python

regex

What is significance of clearing cache while working with re in Python. Does it help in performance or memory management? What happens if we ignore it. Where should re.purge() called?

like image 931
Neha Bhushan Avatar asked Feb 19 '19 19:02

Neha Bhushan


1 Answers

Most code will not need to worry about purging the re module cache. It brings very little memory benefit, and can actually hurt performance if you purged it.

The cache is used to store compiled regular expression objects when you use the top-level re.* functions directly rather than use re.compile(pattern). For example, if you used re.search(r'<some pattern>', string_value) in a loop, then the re module would compile '<some pattern>' only once and store it in the cache, avoiding having to re-compile the pattern each time.

How many such objects are cached and how the cache is managed is an implementation detail, really, but regular expression objects are light-weight objects, taking up at most a few hundred bytes, and Python won't store more than a few hundred of these (Python 3.7 stores up to 512).

The cache is also automatically managed, so purging is not normally needed at all. Use it if you specifically need to account for regular expression compilation time in a repeated time trial test involving re.* functions, or are testing the caching functionality itself. The only locations in the Python standard library that call re.purge() are in tests (specifically in the test_re unittests for the re module and the reference leak test in the regression test suite).

If your code is creating a lot of regular expression objects that you intent to keep using, it is better to use re.compile() and keep your own references to those compiled expression objects. See the re.compile() documentation:

The sequence

prog = re.compile(pattern)
result = prog.match(string)

is equivalent to

result = re.match(pattern, string)

but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

Note: The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.

like image 182
Martijn Pieters Avatar answered Nov 18 '22 08:11

Martijn Pieters