Is it possible to print a regex created using expanded notation (qr/.../x
) in non-expanded form? For example:
my $decimal = qr/
(?=\d|\.\d) # look-ahead to ensure at least one of the optional parts matches
\d* # optional whole digits
(?:\.\d*)? # optional decimal point and fractional digits
/x;
say $decimal;
I want this to be printed as (?=\d|\.\d)\d*(?:\.\d*)?
.
I could write a parser to strip out the non-functional portions but that would be replicating what perl already does and I'd probably get some of the non-trivial cases wrong.
(Yes, this seems a bit silly. I have a use case where I need to print a lot of messages like matched <pattern>
and I'd like to limit messages to a single line while allowing expanded notation to be used for patterns.)
Perl doesn't provide such a utility. It parses regex patterns; it doesn't generate them. The stringification of the object is the exact string provided to the parser, wrapped in a (?:...)
that accounts for the flags. The string provided to the parser is the post-interpolation literal minus the delimiters.[1]
That said, this would be trivial to do with a regex parser.
There is YAPE::Regex, but it hasn't been updated in a long time. For example, it doesn't support the (?^:...)
found in the stringification of regex in modern version of Perl.
There is also Regexp::Parser. It's newer, but it doesn't support (?^:...)
either! But if we were to work around that, it would be be perfect since naturally ignores whitespace and comments! All we need to do is parse the pattern and get a stringifiction from the parse tree.
Finally, there's Regexp::Parsertron. It's the newest, and it does support (?^:...)
, but it doesn't distinguish whitespace and comments from "exact matches" tokens.
So let's use Regexp::Parser.[2]
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
use Regexp::Parser qw( );
{
@ARGV == 1
or die("usage\n");
my $re = $ARGV[0];
# R::P doesn't support «(?^:...)», so we'll
# provide a backwards-compatible stringification.
$re =~ s{^\(\?\^(\w*):}{
my %on = map { $_ => 1 } split //, $1;
my $on = join "", grep $on{$_}, qw( i m s x );
my $off = join "", grep !$on{$_}, qw( i m s x );
"(?$on-$off:"
}e;
my $parser = Regexp::Parser->new($re);
my $roots = $parser->root
or die($parser->errmsg);
say join "", map $_->visual, @$roots;
}
Test:
$ despace_re '(?^x:
(?=\d|\.\d) # look-ahead to ensure at least one of the optional parts matches
\d* # optional whole digits
(?:\.\d*)? # optional decimal point and fractional digits
)'
(?x-ims:(?=\d|\.\d)\d*(?:\.\d*)?)
\Q
, \u
and similar are done at the same stage at interpolation. \N{...}
is resolved to \N{U+...}
in order to immortalize the current charnames settings. Other escapes such as \x27
, \x{0000027}
, \\
and \/
are preserved character for character.
A solution based on YAPE::Regex was used in an earlier revision of this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With