Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make an arbitrary Perl regex wholly non-capturing?

Tags:

regex

perl

How can I remove capturing from arbitrarily nested sub-groups in a a Perl regex string? I'd like to nest any regex into an enveloping expression that captures the sub-regex as a whole entity as well as statically known subsequent groups. Do I need to transform the regex string manually into using all non-capturing (?:) groups (and hope I don't mess up), or is there a Perl regex or library mechanism that provides this?

# How do I 'flatten' $regex to protect $2 and $3?
# Searching 'ABCfooDE' for 'foo' OK, but '((B|(C))fo(o)?(?:D|d)?)', etc., breaks.
# I.E., how would I turn it effectively into '(?:(?:B|(?:C))fo(?:o)?(?:D|d)?)'?
sub check {
  my($line, $regex) = @_;
  if ($line =~ /(^.*)($regex)(.*$)/) {
    print "<", $1, "><", $2, "><", $3, ">\n";
  }
}

Addendum: I am vaguely aware of $&, $`, and $' and have been advised to avoid them if possible, and I don't have access to ${^PREMATCH}, ${^MATCH} and ${^POSTMATCH} in my Perl 5.8 environment. The example above can be partitioned into 2/3 chunks using methods like these, and more complex real cases could manually iterate this, but I think I'd like a general solution if possible.

Accepted Answer: What I wish existed and surprisingly (to me at least) does not, is an encapsulating group that makes its contents opaque, such that subsequent positional backreferences see the contents as a single entity and names references are de-scoped. gbacon has a potentially useful workaround for Perl 5.10+, and FM shows a manual iterative mechanism for any version that can accomplish the same effect in specific cases, but j_random_hacker calls it that there is no real language mechanism to encapsulate subexpressions.

like image 771
Jeff Avatar asked Aug 24 '10 01:08

Jeff


People also ask

What is non capturing group in regex?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.

What is $1 Perl?

$1 equals the text " brown ".

How do I print a matched pattern in Perl?

The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions. To print this matched pattern and the remaining string, m operator provides various operators which include $, which contains whatever the last grouping match matched.

What is in Perl regex?

Regular Expression (Regex or Regexp or RE) in Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to the host language and is not the same as in PHP, Python, etc. Sometimes it is termed as “Perl 5 Compatible Regular Expressions“.


1 Answers

In general, you can't.

Even if you could transform all (...)s into (?:...)s, this would not work in the general case because the pattern might require backreferences: e.g. /(.)X\1/, which matches any character, followed by an X, followed by the originally matched character.

So, absent a Perl mechanism for discarding captured results "after the fact", there is no way to solve your problem for all regexes. The best you can do (or could do if you had Perl 5.10) is to use gbacon's suggestion and hope to generate a unique name for the capture buffer.

like image 103
j_random_hacker Avatar answered Oct 05 '22 03:10

j_random_hacker