I'd like to search for lines that don't start with a pound sign (#) on indented code.
Currently, I'm using the regex ^\s*([^\s#].*)
with multiline option on.
My problem is that on non commented lines it works perfectly.
On commented lines the regex engine performs a backtrack due to \s*
all the way from the comment sign to the start of the line, which can sometimes cause 40 or 50 backtrack steps.
The regex works perfectly on python code. It's just not very efficient due to the backtracking caused by the engine.
Any idea as of how to avoid it?
Bonus: It's rather funny that the regex engine doesn't recognize the fact that it's searching for [^\s]
one by one in \s*
and causes this amount of backtracking. What are the challenges in making the re engine work so?
Bonus 2: Using only the stdlib re module. As I cannot add 3rd parties. (I'm technically searching using sublime text but want to know how to generally do it in Python)
Backtracking occurs when a regular expression pattern contains optional quantifiers or alternation constructs, and the regular expression engine returns to a previous saved state to continue its search for a match.
In short, to match a literal backslash, one has to write '\\\\' as the RE string, because the regular expression must be "\\", and each backslash must be expressed as "\\" inside a regular Python string literal.
The most portable regex would be ^[ \t\n]*$ to match an empty string (note that you would need to replace \t and \n with tab and newline accordingly) and [^ \n\t] to match a non-whitespace string.
Use atomic feature of lookarounds to avoid backtrack:
^(?=(\s*))\1([^#].*)
^^^^^ ^
This usage is simplified in a negative lookahead which is proposed by @vks beautifully.
or possessive quantifiers while using regex
module:
^\s*+([^#].*)
or even atomic groups:
^(?>\s*)([^#].*)
Sublime Text supports all three since being on PCRE.
and for bonus part, no it's not funny. If you be more eagle-eye on it you'll see it's not [^\s]
which is literally equal to \S
but it is a little bit different: [^\s#]
which for engine means it has two different paths at each step to look for so it backtracks to reach one.
You can simply say
^(?!\s*#).*
This takes just 6 steps in comparison to 33 steps taken by yours.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With