Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex slower than expected

I read a cool article on how to avoid creating slow regular expressions. Generally speaking it looks like the longer and more explicit and regex is the faster it will complete. A greedy regex can be exponentially slower.

I thought I would test this out by measuring the time it takes to complete a more complex/explicit statement with a less complex/greedy statement. For the most part everything seems to hold true, but I have one greedy statement that clocked in slower. Here are two examples:

import re
from timeit import timeit

# This works as expected, the explicit is faster than the greedy.
# http_x_real_ip explicit 
print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
1.159849308001867

# http_x_real_ip greedy
print(timeit(setup="import re", stmt='''r = re.search(r'((?:\d{1,3}\.){3}\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
1.7421739230003368

# This does not work as expected, greedy is faster.
# time_local explicit
print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,2}/\w{3}/[2][0]\d{2}:\d{2}:\d{2}:\d{2}\s[+][0]{4})', "[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
1.248802040994633

# time_local greedy
print(timeit(setup="import re", stmt='''r = re.search(r'\[(.*)\]', "[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
1.0256699790043058

Is the local_time explict regex just poorly written?

like image 843
HammerMeetNail Avatar asked Jul 01 '15 14:07

HammerMeetNail


People also ask

How do I make regular expressions faster?

Expose Literal Characters Regex engines match fastest when anchors and literal characters are right there in the main pattern, rather than buried in sub-expressions. Hence the advice to "expose" literal characters whenever you can take them out of an alternation or quantified expression. Let's look at two examples.

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.

Is using regex slow?

Yet matching a string with a regex can be surprisingly slow. So slow it can even stop any JS app or take 100% of a server CPU time causing denial of service (DOS). At TextMaster we used regular expressions in an important part of our translation platform.

Is compiled regex faster?

Regex has an interpreted mode and a compiled mode. The compiled mode takes longer to start, but is generally faster.


1 Answers

The more a regular expression has to backtrack, the slower it is.

This might not hold for very small input data. However, who would care about the performance on small data? :D


This topic is well covered in this article:

  • http://www.regular-expressions.info/catastrophic.html

Also there are interesting contributions in this question:

  • Greedy vs. Reluctant vs. Possessive Quantifiers
like image 136
fferri Avatar answered Oct 05 '22 04:10

fferri