It's possible to make a conjunction, so that the string matches 2 or more <code>regex</code> patterns. <pre class="prettyprint lang-perl6 prettyprint-override"><code>> "banana" ~~ m:g/ . a && b . / (｢ba｣) </code></pre> Also, it's possible to negate a character class: if I want to match only consonants, I can take all the letters and subtract character class of vowels: <pre class="prettyprint lang-perl6 prettyprint-override"><code>> "camelia" ~~ m:g/ <.alpha> && <-[aeiou]> / (｢c｣｢m｣｢l｣) </code></pre> But what if I need to negate/subtract not a character class, but a <code>regex</code> of any length? Something like this: <pre class="prettyprint lang-perl6 prettyprint-override"><code>> "banana" ~~ m:g/ . **3 && NOT ban / # doesn't work (｢ana｣) </code></pre>

What does it even mean to "negate" a regex? When you talk about the computer science definition of a regex, then it always needs to match a whole string. In this scenario, negation is pretty easy to define. But by default, regexes in Perl 6 search, so they don't have to match the whole string. This means you have to be careful to define what you mean by "negate". If by negation of a regex <code>A</code> you mean a regex that matches whenever <code>A</code> does not match a whole string, and vice versa, you can indeed work with <code><!before ...></code>, but you need to be careful with anchoring: <code>/ ^ <!before A $ > .* /</code> is this exact negation. If by negation of a regex <code>A</code> you mean "only match if <code>A</code> matches nowhere in the string", you have to use something like <code>/ ^ [<!before A> .]* $ /</code>. If you have another definition of negation in mind, please share it.

How to negate/subtract regexes (not only character classes) in Perl 6?

Tags:

raku

It's possible to make a conjunction, so that the string matches 2 or more regex patterns.

> "banana" ~~ m:g/ . a && b . /
(｢ba｣)

Also, it's possible to negate a character class: if I want to match only consonants, I can take all the letters and subtract character class of vowels:

> "camelia" ~~ m:g/ <.alpha> && <-[aeiou]> /
(｢c｣ ｢m｣ ｢l｣)

But what if I need to negate/subtract not a character class, but a regex of any length? Something like this:

> "banana" ~~ m:g/ . **3 && NOT ban / # doesn't work
(｢ana｣)

448

asked Nov 20 '17 16:11

Eugene Barsky

2 Answers

TL;DR Moritz's answer covers some important issues. This answer focuses on matching sub-strings per Eugene's comment ("I want to find substring(s) that match regex R, but don't match regex A.").

Write an assertion that says you are NOT sitting immediately before the regex you don't want to match and then follow that with the regex you do want to match:

say "banana" ~~ m:g/ <!before ban> . ** 3 / # (｢ana｣)

The before assertion is called a "zero width" assertion. This means that if it succeeds (which in this case means it does not "match" because we've written !before rather than just before), the matching position is not moved.

(Of course, if such an assertion fails and there's no alternative pattern that matches at the current match position, the match engine then steps forward one character position.)

It's possible that you want the patterns in the opposite order, with the positive match first and the negative second, as you showed in your question. (Perhaps the positive match is faster than the negative, so reversing their order will speed up the match.)

One way that will work for fairly simple patterns is using a negative after assertion:

say "banana" ~~ m:g/ . ** 3 <!after ban> / # (｢ana｣)

However, if the negative pattern is sufficiently complex you may need to use this formulation:

say "banana" ~~ m:g/ . ** 3 && <!before ban> .*? / # (｢ana｣)

This inserts a && regex conjunction operator that, presuming the LHS pattern succeeds, tries the RHS as well after resetting the matching position (which is why the RHS now starts with <!before ban> rather than <!after ban>) and requires that the RHS matches the same length of input (which is why the <!before ban> is followed by the .*? "padding").

104

answered Oct 24 '22 17:10

raiph

What does it even mean to "negate" a regex?

When you talk about the computer science definition of a regex, then it always needs to match a whole string. In this scenario, negation is pretty easy to define. But by default, regexes in Perl 6 search, so they don't have to match the whole string. This means you have to be careful to define what you mean by "negate".

If by negation of a regex A you mean a regex that matches whenever A does not match a whole string, and vice versa, you can indeed work with <!before ...>, but you need to be careful with anchoring: / ^ <!before A $ > .* / is this exact negation.

If by negation of a regex A you mean "only match if A matches nowhere in the string", you have to use something like / ^ [<!before A> .]* $ /.

If you have another definition of negation in mind, please share it.

answered Oct 24 '22 17:10

moritz

Related questions
                            
                                how to create methods from arrays or hashes in perl6
                            
                                What scope does ":my $foo" have and what is it used for?
                            
                                Why doesn't this perl 6 grammar work?
                            
                                Adding user mode types for Perl 6 NativeCall structs
                            
                                Regex speed in Perl 6
                            
                                If I reassigned OUT in Perl 6, how can I change it back to stdout?
                            
                                Perl 6 udp socket: how to read response from server?
                            
                                perl6: access values in a multidimensional variable
                            
                                how to create a data structure like a tree
                            
                                Making a partial array in Perl6
                            
                                Reducing logical operators in Perl 6
                            
                                Perl 6: writable multidimensional subscript access with AT-POS
                            
                                How do I decompress a Git object properly in Raku Perl 6?
                            
                                Mixing-in roles in traits apparently not working
                            
                                Convert a string to list of hexadecimal of each byte (Raku)
                            
                                Raku: Using topic variable (from a 'for') inside a regex
                            
                                Strings and Strands in MoarVM
                            
                                What counts as an "outer list" for a Slip?
                            
                                perl6/rakudo: Does perl6 enable "autoflush" by default?
                            
                                How does one use SQLite in Perl 6?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With