Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make subrule/regex case-insensitive when used in match?

I am trying to match any keywords in a group. Keywords are in array @b. I am unable to make case-insensitive matches. I have done some testing, and the following script is an example:

> my $line = "this is a test line";
this is a test line

> my @b = < tes lin > ; 
[tes lin]

> my regex a { || @b };
regex a { || @b }

> say $line ~~ m:i/ <a> /    # matching the first as expected
「tes」
 a => 「tes」

> say $line ~~ m:i:g/ <a> /  # matching both as expected
(「tes」
 a => 「tes」 「lin」
 a => 「lin」)

> my @b = < tes LIN > ; 
[tes LIN]
> my regex a { || @b };
regex a { || @b }
> say $line ~~ m:i:g/ <a> /   # should match both "tes" and "LIN" but skips "LIN"
(「tes」
 a => 「tes」)

> my @b = < TES lin >
[TES lin]
> my regex a { || @b }
regex a { || @b }
> say $line ~~ m:i:g/ <a> /   # expect to match both but skips "TES"
(「lin」
 a => 「lin」)

Also, mapping to all lower cases does not work:

> my @b = < TES lin >.lc
[tes lin]
> my regex a { || @b }
regex a { || @b }
> say $line ~~ m:i:g/ <a> /
()

My question is, how should case-insensitivity be handled when a regex/subrule is actually called?

I tried to put :i adverb inside regex a but the resulting matches are futile:

> my regex a { :i || @b }
regex a { :i || @b }
> say $line ~~ m:i:g/ <a> /
(「」
 a => 「」 「」

and then 19 lines of "a => 「」 「」"

 a => 「」)
like image 672
lisprogtor Avatar asked Feb 11 '19 07:02

lisprogtor


1 Answers

The problem with:

my regex a { || @b }
say $line ~~ m:i/ <a> /

Is that a is the regex in charge of matching the values in @b, and it isn't compiled with :i.
In Perl6 regexes are code, you can't change how a regex works from a distance like that.

Then there is another problem with:

my regex a { :i || @b }

It is really compiled as:

my regex a {
     [ :i    ]
  ||
     [    @b ]
}

That is match ignorecase[nothing] and if that fails (it won't fail) match one of the values in @b.

The only reason to use || @… is so that it matches the values in @… in the order they are defined.

> my @c = < abc abcd foo >;

> say 'abcd' ~~ / || @c /
「abc」

I think that in most cases it would actually work better to just let it be the default | semantics.

> my @c = < abc abcd foo >;

> say 'abcd' ~~ / |  @c /
「abcd」
> say 'abcd' ~~ /    @c /
「abcd」

So then this would work the way you want it to:

my regex a { :i @b }

That is <a>|<b> will match whichever has the longest starting expression. While <a>||<b> will try <a> first, and if that fails it will try <b>.


If you really want || semantics, any of these would work:

my regex a {     ||  :i @b  }
my regex a { :i [||     @b] }

The following doesn't have || semantics.
In fact the || doesn't do anything.

my regex a {     || [:i @b] }

It is the same as these:

my regex a {     |   :i @b  }
my regex a {         :i @b  }
like image 153
Brad Gilbert Avatar answered Sep 28 '22 16:09

Brad Gilbert