Some regex engines support backtracking-related verbs: (*PRUNE)
, (*SKIP)
, (?{doSomeCode();})
*, etc. I already know what these verbs do, from Reference - What does this regex mean?.
I'm inclined to think that these verbs are somewhat esoteric, or at least an unnecessary step to a more low-level type of programming. Instead of needing a (*PRUNE)
, wouldn't it be better (reducing the complexity for both regex writer/programmer and reader) to have more optimizations behind the scenes (like in the compiler or engine)?
So, in what situation is it useful to include a backtracking-related verb in a regex, in practice? Is there ever a benefit to do so?
*
While not technically a backtracking control verb, many examples that execute arbitrary code in regexes do it in a way that affects or is affected by backtracking.
Background
These features were originally experimental, though they are no longer labeled as such in the Perl regex tutorial. It's no wonder that I am unable to find much about these constructs on the Internet (especially when the search is clogged with irrelevant results that have skip
or prune
outside the code). I'll bet that there are numerous people advanced enough in regex to use these verbs that simply don't know about them.
So there are a number of practical barriers preventing widespread use:
Trying to find an answer that looks past this and finds a good use-case, or the reasoning from the developers who created these features.
I'm also aware that a similar closed (too opinion-based) question exists, but it didn’t answer my question as the only answer to say "yes" to that question gave two links, one of which was an esoteric use (additionally, I cannot understand it...). The other one, while it gave a situation of when to use (*FAIL)
, did not did not address any of the other constructs I mentioned, nor did it use (*FAIL)
as a Backtracking mechanism. From what I understand, (*FAIL)
can be emulated by any regex that always fails.
Let me respecify what I am looking for in an answer:
One good piece of documentation you can look at is the section about directives in Parse::RecDescent. The <commit>
directive, in particular, seems somehow related to (*PRUNE)
(although there is a (*COMMIT)
too), and contains an instructive example.
My personal impression is that most of the times they provide you tools to make your regexes better (e.g. more performing, or clearer) but not necessarily more effective. As an example, you can probably live without (*PRUNE)
, but you would suffer from heavier backtracking and how this affects you depends on what you're trying to match. Re (*FAIL)
, it might probably be emulated with a non-matching sub-regex, but it's much more clear what the intent is that it at least enhances readability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With