Python Regex slower than expected

Tags:

I read a cool article on how to avoid creating slow regular expressions. Generally speaking it looks like the longer and more explicit and regex is the faster it will complete. A greedy regex can be exponentially slower.

I thought I would test this out by measuring the time it takes to complete a more complex/explicit statement with a less complex/greedy statement. For the most part everything seems to hold true, but I have one greedy statement that clocked in slower. Here are two examples:

Click to copy

import re
from timeit import timeit

# This works as expected, the explicit is faster than the greedy.
# http_x_real_ip explicit 
print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
1.159849308001867

# http_x_real_ip greedy
print(timeit(setup="import re", stmt='''r = re.search(r'((?:\d{1,3}\.){3}\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
1.7421739230003368

# This does not work as expected, greedy is faster.
# time_local explicit
print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,2}/\w{3}/[2][0]\d{2}:\d{2}:\d{2}:\d{2}\s[+][0]{4})', "[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
1.248802040994633

# time_local greedy
print(timeit(setup="import re", stmt='''r = re.search(r'\[(.*)\]', "[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
1.0256699790043058

Is the local_time explict regex just poorly written?

843

asked Jul 01 '15 14:07

HammerMeetNail

1 Answers

The more a regular expression has to backtrack, the slower it is.

_{This might not hold for very small input data. However, who would care about the performance on small data? :D}

This topic is well covered in this article:

http://www.regular-expressions.info/catastrophic.html

Also there are interesting contributions in this question:

Greedy vs. Reluctant vs. Possessive Quantifiers

136

answered Oct 05 '22 04:10

fferri

Related questions
                            
                                Calling python module from Java
                            
                                Flask sse-stream not terminated after firefox disconnects
                            
                                Matplotlib figure facecolor alpha while saving (background color, transparency)
                            
                                Odoo/OpenERP: hiding create button from treeview
                            
                                Why is "not" faster than "bool()" in Python (or, speed of Python functions vs. statements)?
                            
                                Cython vs numpy performance scaling
                            
                                Calculate the Fourier series with the trigonometry approach
                            
                                functools.wraps equivalent for class decorator
                            
                                Using python's Multiprocessing makes response hang on gunicorn
                            
                                How to tell if boto is using SSLv3 or TLS?
                            
                                the filter of sniff function in scapy does not work properly
                            
                                Sublime Text 3 REPL - Open program in same REPL window
                            
                                Django: How to access test database?
                            
                                Django: Use LayerMapping to update an existing model?
                            
                                Extract emoticons from a text
                            
                                Gtk3 TextBuffer.serialize() returns text with format tags, even when there is visually none
                            
                                Closing the window doesn't kill all processes
                            
                                Angular route not working when used with Google App Engine and Flask
                            
                                self referential many to many flask-sqlalchemy
                            
                                Python Multiprocessing RuntimeError on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Regex slower than expected

Tags:

performance

python

regex

HammerMeetNail

People also ask

1 Answers

fferri

Recent Activity

Donate For Us