Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do chained regex in Perl?

Tags:

regex

perl

I was excited to learn that Perl can handle chained comparisons in version 5.32

However, I'm trying to do chained regex comparisons to keep my code shorter and cleaner

#!/usr/bin/env perl

use 5.032;
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';

if (9 > 2 < 3 < 4 > 0) {
    say 'chained expressions work.'
} else {
    say 'chained expressions do not work.'
}

my $x = 4;
my $z = 4;

if ($x == 4 == $z) {
    say 'chained equality works';
}

$x = 'x';
$z = 'x';

if ($x eq 'x' eq $z) {
    say 'chained string comparisons work.';
}

$x = '.';
$z = './.';
if ($x =~ m/\./ =~ $z) { # unfortunately this doesn't work
# equivalent of `if ( ($x =~ m/\./) && ($z =~ m/\./)) {
    say 'chained regex works.';
} else {
    say 'no chained regex.'; 
}

How can I correctly do chained comparisons with regex?

like image 541
con Avatar asked Jan 20 '21 19:01

con


People also ask

What is \W in Perl regex?

A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word).

What does \s mean in Perl?

Substitution Operator or 's' operator in Perl is used to substitute a text of the string with some pattern specified by the user.

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.

How do I match parentheses in Perl?

So, if you use /./, you'll match any single character (except newline); if you use /(.)/, you'll still match any single character, but now it will be kept in a regular expression memory. For each pair of parentheses in the pattern, you'll have one regular expression memory.


2 Answers

While 100 <= $x < 200 has an obvious meaning, A =~ B =~ C does not. Which of those operands should be strings? Which of them should be match operators? What operation should be performed?

You suggested putting a string on the RHS of the final =~ to match against it, but that's not how =~ works at all; the string is always on the LHS. $s =~ /a/ =~ /b/ could possibly make sense, but it's still not obvious what that would do, especially if there are captures. As such, this isn't supported.

Here's are options if you want to chain some matches:

all { $s =~ $_ } qr/a/, qr/b/

any { $s =~ $_ } qr/a/, qr/b/

none { $s =~ $_ } qr/a/, qr/b/

notall { $s =~ $_ } qr/a/, qr/b/

all { /a/ } $s1, $s2

any { /a/ } $s1, $s2

none { /a/ } $s1, $s2

notall { /a/ } $s1, $s2

These functions provided by core module List::Util.

like image 188
ikegami Avatar answered Sep 26 '22 14:09

ikegami


Note that Perl's "chaining" is more like a macro (and personally, I think "syntax" features that basically rearrange code are big red flags):

$x OP1 $y OP2 $z

This is effectively rewritten to something like this (with some other details accounted for but unimportant here):

$x OP1 $y and $y OP2 $z

This only works with certain operators—the ones that make comparisons.

Consider what this macro would do to this if it were to work:

$x =~ m/\./ =~ $z

This transformation gives you:

$x =~ m/\./ and m/\./ =~ $z

That is, the thing in the middle is on the right for the first operation and then the left for the second. $x would to a match operator, then the result of a match operator would bind to $z, which would have to be a pre-compiled pattern, I guess.

ikegami already showed that you aren't actually chaining things. You want to test things in series.


I was generally against this feature, but it was introduced to Perl immediately without going through an experimental cycle. In a related Github issue I raised, people went back and forth in the thinking about it (but note the number of reactions to my comment that this would confuse people is quite high, but effectively ignored). In the Perl 5 Thread, some people thought people would be confused, and some people that would be impossible. The second group won.

I wasn't confused or against this until after I wrote Chain comparisons to avoid excessive typing, and the first comment on Reddit.

But here we are.

My advice is to not use chained comparisons. Even though the docs are correct, the feature is going to confuse regular users. It doesn't act like they want it to. People tend to guess what a programming language will do based on other things they know about it. Instead of just checking every case, they intuit, sometimes incorrectly, how something will or should act.

like image 32
brian d foy Avatar answered Sep 26 '22 14:09

brian d foy