Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I print an expanded regex in unexpanded form?

Tags:

regex

perl

Is it possible to print a regex created using expanded notation (qr/.../x) in non-expanded form? For example:

my $decimal = qr/
  (?=\d|\.\d)  # look-ahead to ensure at least one of the optional parts matches
  \d*          # optional whole digits
  (?:\.\d*)?   # optional decimal point and fractional digits
/x;

say $decimal;

I want this to be printed as (?=\d|\.\d)\d*(?:\.\d*)?.

I could write a parser to strip out the non-functional portions but that would be replicating what perl already does and I'd probably get some of the non-trivial cases wrong.

(Yes, this seems a bit silly. I have a use case where I need to print a lot of messages like matched <pattern> and I'd like to limit messages to a single line while allowing expanded notation to be used for patterns.)

like image 870
Michael Carman Avatar asked Nov 15 '19 20:11

Michael Carman


1 Answers

Perl doesn't provide such a utility. It parses regex patterns; it doesn't generate them. The stringification of the object is the exact string provided to the parser, wrapped in a (?:...) that accounts for the flags. The string provided to the parser is the post-interpolation literal minus the delimiters.[1]

That said, this would be trivial to do with a regex parser.

There is YAPE::Regex, but it hasn't been updated in a long time. For example, it doesn't support the (?^:...) found in the stringification of regex in modern version of Perl.

There is also Regexp::Parser. It's newer, but it doesn't support (?^:...) either! But if we were to work around that, it would be be perfect since naturally ignores whitespace and comments! All we need to do is parse the pattern and get a stringifiction from the parse tree.

Finally, there's Regexp::Parsertron. It's the newest, and it does support (?^:...), but it doesn't distinguish whitespace and comments from "exact matches" tokens.

So let's use Regexp::Parser.[2]

#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );

use Regexp::Parser qw( );

{
   @ARGV == 1
      or die("usage\n");

   my $re = $ARGV[0];

   # R::P doesn't support «(?^:...)», so we'll
   # provide a backwards-compatible stringification.
   $re =~ s{^\(\?\^(\w*):}{
      my %on = map { $_ => 1 } split //, $1;
      my $on  = join "", grep  $on{$_}, qw( i m s x );
      my $off = join "", grep !$on{$_}, qw( i m s x );
      "(?$on-$off:"
   }e;

   my $parser = Regexp::Parser->new($re);
   my $roots = $parser->root
      or die($parser->errmsg);

   say join "", map $_->visual, @$roots;
}

Test:

$ despace_re '(?^x:
   (?=\d|\.\d)  # look-ahead to ensure at least one of the optional parts matches
   \d*          # optional whole digits
   (?:\.\d*)?   # optional decimal point and fractional digits
)'
(?x-ims:(?=\d|\.\d)\d*(?:\.\d*)?)

  1. \Q, \u and similar are done at the same stage at interpolation. \N{...} is resolved to \N{U+...} in order to immortalize the current charnames settings. Other escapes such as \x27, \x{0000027}, \\ and \/ are preserved character for character.

  2. A solution based on YAPE::Regex was used in an earlier revision of this answer.

like image 158
ikegami Avatar answered Nov 03 '22 20:11

ikegami