I'm using python regex library to parse some strings and currently I found that my regex is either too complicated or the string I'm searching is too long.
Here's an example of the hang up:
>>> import re
>>> reg = "(\w+'?\s*)+[-|~]\s*((\d+\.?\d+\$?)|(\$?\d+\.?\d+))"
>>> re.search(reg, "**LOOKING FOR PAYPAL OFFERS ON THESE PAINTED UNCOMMONS**") #Hangs here...
I'm not sure what's going on. Any help appreciated!
EDIT: Here's a link with examples of what I'm trying to match: Regxr
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.
The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None otherwise.
Python has a module named re to work with regular expressions. To use it, we need to import the module. The module defines several functions and constants to work with RegEx.
IGNORECASE : This flag allows for case-insensitive matching of the Regular Expression with the given string i.e. expressions like [A-Z] will match lowercase letters, too.
The reason why the code execution hangs is catastrophic backtracking due to one obligatory and 1+ optional patterns (those that can match an empty string) inside a quantified group (\w+'?\s*)+
that allows a regex engine to test a lot of matching paths, so many that it takes too long to complete.
I suggest unwrapping the problematic group in such a way that '
or \s
become obligatory and wrap them in an optional group:
(\w+(?:['\s]+\w+)*)\s*[-~]\s*(\$?\d+(?:\.\d+)?\$?)
^^^^^^^^^^^^^^^^^^^***
See the regex demo
Here, (\w+(?:['\s]+\w+)*)
will match 1+ word chars, and then 0+ sequences of 1+ '
or whitespaces followed with 1+ word chars. This way, the pattern becomes linear and the regex engine fails the match quicker if a non-matching string occurs.
The rest of the pattern:
\s*[-~]\s*
- either -
or ~
wrapped with 0+ whitespaces(\$?\d+(?:\.\d+)?\$?)
- Group 2 capturing
\$?
- 1 or 0 $
symbols\d+
- 1+ digits(?:\.\d+)?
- 1 or 0 zero sequences of:
\.
- a dot\d+
- 1+ digits\$?
- 1 or 0 $
symbolsIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With