What is the best way to perform, inside a regex, negation of multiple words and permutations of chars that make up those words? For instance: I do not want <pre class="prettyprint"><code>"zero dollar" "roze dollar" "eroz dollar" "one dollar" "noe dollar" "oen dollar" </code></pre> but I do want <pre class="prettyprint"><code>"thousand dollar" "million dollar" "trillion dollar" </code></pre> If I write <pre class="prettyprint"><code>not m/ [one | zero] \s dollar / </code></pre> it will not match permutations of chars, and the "not" function outside will make the regex match everything else like "big bang" without the "dollar" in the regex. <pre class="prettyprint"><code>m/ <- [one] | [zero] > \s dollar/ # this is syntax error. </code></pre> Thank you very much ! lisprog

Here's a solution that works well. It uses a helper-sub <code>is-bad-word</code> that compares the <code>$needle</code> (i.e. what it found in the target string) against the <code>@badwords</code> and if <code>any</code> matches, it'll return True. Inside the regex itself, I've used a negative code-assertion that passes the <code>(\w+)</code> that was matched into the helper sub. One important thing to point out: If you don't properly anchor the <code>(\w+)</code> to the beginning of a word (i chose beginning of the string this time) it will just skip ahead one character when it found a bad word and accept anyway (unless the bad word was only one character to begin with, like in <code>a dollar</code>). After all, zero is in your <code>@badwords</code>, but <code>ero</code> isn't. Hope that helps! <pre class="prettyprint"><code>my @badwords = <one zero yellow>; my @parsefails = q:to/EOF/.lines; zero dollar roze dollar erzo dollar one dollar noe dollar oen dollar yellow dollar wolley dollar EOF my @parsepasses = q:to/EOF/.lines; thousand dollar million dollar dog dollar top dollar meme dollar EOF sub is-bad-word($needle) { return $needle.comb.sort eq any(@badwords).comb.sort } use Test; plan @parsefails + @parsepasses; for flat (@parsefails X False), (@parsepasses X True) -> $line, $should-pass { my $succ = so $line ~~ / ^ (\w+) \s <!{ is-bad-word($0.Str) }> 'dollar' /; ok $succ eqv $should-pass, "$line -> $should-pass"; } done-testing; </code></pre>

perl6 Negating multiple words and permutations of their chars inside a regex

Tags:

regex

permutation

negation

raku

What is the best way to perform, inside a regex, negation of multiple words and permutations of chars that make up those words?

For instance: I do not want

"zero dollar"
"roze dollar"
"eroz dollar"
"one dollar"
"noe dollar"
"oen dollar"

but I do want

"thousand dollar"
"million dollar"
"trillion dollar"

If I write

not m/ [one | zero] \s dollar /

it will not match permutations of chars, and the "not" function outside will make the regex match everything else like "big bang" without the "dollar" in the regex.

m/ <- [one] | [zero] > \s dollar/ # this is syntax error.

Thank you very much !

lisprog

985

asked Mar 01 '17 17:03

lisprogtor

2 Answers

Using a code assertion:

You could match any word, and then use a <!{ }> assertion to reject words that are permutations of "one" or "zero":

say "two dollar" ~~ / :s ^ (\w+) <!{ $0.comb.sort.join eq "eno" | "eorz" }> dollar $ /;

Using `before`/`after`:

Alternatively, you could pre-generate all permutations of the disallowed words, and then reject them using a <!before > or <!after > assertion in the regex:

my @disallowed = <one zero>.map(|*.comb.permutations)».join.unique;

say "two dollar" ~~ / :s ^ <!before @disallowed>\w+ dollar $ /;
say "two dollar" ~~ / :s ^ \w+<!after @disallowed> dollar $ /;

162

answered Oct 11 '22 13:10

smls

Here's a solution that works well. It uses a helper-sub is-bad-word that compares the $needle (i.e. what it found in the target string) against the @badwords and if any matches, it'll return True.

Inside the regex itself, I've used a negative code-assertion that passes the (\w+) that was matched into the helper sub.

One important thing to point out: If you don't properly anchor the (\w+) to the beginning of a word (i chose beginning of the string this time) it will just skip ahead one character when it found a bad word and accept anyway (unless the bad word was only one character to begin with, like in a dollar). After all, zero is in your @badwords, but ero isn't.

Hope that helps!

my @badwords = <one zero yellow>;

my @parsefails = q:to/EOF/.lines;
    zero dollar
    roze dollar
    erzo dollar
    one dollar
    noe dollar
    oen dollar
    yellow dollar
    wolley dollar
    EOF

my @parsepasses = q:to/EOF/.lines;
    thousand dollar
    million dollar
    dog dollar
    top dollar
    meme dollar
    EOF

sub is-bad-word($needle) {
    return $needle.comb.sort eq any(@badwords).comb.sort
}

use Test;
plan @parsefails + @parsepasses;

for flat (@parsefails X False), (@parsepasses X True) -> $line, $should-pass {
    my $succ = so $line ~~ / ^ (\w+) \s <!{ is-bad-word($0.Str) }> 'dollar' /;
    ok $succ eqv $should-pass, "$line -> $should-pass";
}

done-testing;

answered Oct 11 '22 13:10

timotimo

Related questions
                            
                                Using a regular expression in a Greasemonkey @include?
                            
                                Intersecting texts to find common words
                            
                                Extended POSIX utilities using command find on Mac OS X?
                            
                                passing bash array elements to awk regex inside loop
                            
                                Remove elements from an array that do not match a regex
                            
                                How to print regex match results in python 3?
                            
                                Read file until specific line in python
                            
                                Amazon AWS S3 IAM Policy based on namespace or tag
                            
                                How can I set maximum length to this regular expression?
                            
                                replace placeholder tags with dictionary fields in python
                            
                                Max 3 digits, up to 3 decimals
                            
                                replacing curly brackets and text in it with node
                            
                                PCRE regex to remove empty braces
                            
                                Replace string with regex in swift
                            
                                Javascript Replace Using Regex With A Non-Capturing Group
                            
                                How do I exclude a package for a specific findbugs rule
                            
                                How to use basic regular expressions within gulp.src?
                            
                                Regex extension and language
                            
                                Regex Matching - A letter not preceded by another letter
                            
                                regex to exclude a sentence which contains a specific word in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

perl6 Negating multiple words and permutations of their chars inside a regex

Tags:

regex

permutation

negation

raku

lisprogtor

People also ask

2 Answers

Using a code assertion:

Using `before`/`after`:

smls

timotimo

Recent Activity

Donate For Us

perl6 Negating multiple words and permutations of their chars inside a regex

Tags:

regex

permutation

negation

raku

lisprogtor

People also ask

2 Answers

Using a code assertion:

Using before/after:

smls

timotimo

Related questions

Recent Activity

Donate For Us

Using `before`/`after`: