Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should regex backtracking controls, like (*PRUNE), be used?

Some regex engines support backtracking-related verbs: (*PRUNE), (*SKIP), (?{doSomeCode();})*, etc. I already know what these verbs do, from Reference - What does this regex mean?.

I'm inclined to think that these verbs are somewhat esoteric, or at least an unnecessary step to a more low-level type of programming. Instead of needing a (*PRUNE), wouldn't it be better (reducing the complexity for both regex writer/programmer and reader) to have more optimizations behind the scenes (like in the compiler or engine)?

So, in what situation is it useful to include a backtracking-related verb in a regex, in practice? Is there ever a benefit to do so?


* While not technically a backtracking control verb, many examples that execute arbitrary code in regexes do it in a way that affects or is affected by backtracking.


Background

These features were originally experimental, though they are no longer labeled as such in the Perl regex tutorial. It's no wonder that I am unable to find much about these constructs on the Internet (especially when the search is clogged with irrelevant results that have skip or prune outside the code). I'll bet that there are numerous people advanced enough in regex to use these verbs that simply don't know about them.

So there are a number of practical barriers preventing widespread use:

  1. Features are experimental
  2. Features are obscure
  3. Features are advanced

Trying to find an answer that looks past this and finds a good use-case, or the reasoning from the developers who created these features.

I'm also aware that a similar closed (too opinion-based) question exists, but it didn’t answer my question as the only answer to say "yes" to that question gave two links, one of which was an esoteric use (additionally, I cannot understand it...). The other one, while it gave a situation of when to use (*FAIL), did not did not address any of the other constructs I mentioned, nor did it use (*FAIL) as a Backtracking mechanism. From what I understand, (*FAIL) can be emulated by any regex that always fails.

Let me respecify what I am looking for in an answer:

  • Relates specifically to backtracking
  • Non-Esoteric
  • Practical
  • More than an example of usage
  • Has an explanation for any examples given
  • May include background on reason for adding features
  • May include updates, with sources, relevant to the future of the features (in Perl or other regex flavors)
like image 361
Laurel Avatar asked Mar 23 '16 16:03

Laurel


1 Answers

One good piece of documentation you can look at is the section about directives in Parse::RecDescent. The <commit> directive, in particular, seems somehow related to (*PRUNE) (although there is a (*COMMIT) too), and contains an instructive example.

My personal impression is that most of the times they provide you tools to make your regexes better (e.g. more performing, or clearer) but not necessarily more effective. As an example, you can probably live without (*PRUNE), but you would suffer from heavier backtracking and how this affects you depends on what you're trying to match. Re (*FAIL), it might probably be emulated with a non-matching sub-regex, but it's much more clear what the intent is that it at least enhances readability.

like image 75
polettix Avatar answered Nov 06 '22 01:11

polettix