Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perl6 Negating multiple words and permutations of their chars inside a regex

What is the best way to perform, inside a regex, negation of multiple words and permutations of chars that make up those words?

For instance: I do not want

"zero dollar"
"roze dollar"
"eroz dollar"
"one dollar"
"noe dollar"
"oen dollar"

but I do want

"thousand dollar"
"million dollar"
"trillion dollar"

If I write

not m/ [one | zero] \s dollar /

it will not match permutations of chars, and the "not" function outside will make the regex match everything else like "big bang" without the "dollar" in the regex.

m/ <- [one] | [zero] > \s dollar/ # this is syntax error.

Thank you very much !

lisprog

like image 985
lisprogtor Avatar asked Mar 01 '17 17:03

lisprogtor


People also ask

What is the syntax for regular expressions in Perl?

The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as sed, grep, and awk. The basic method for applying a regular expression is to use the pattern binding operators =~ and ! ~.

How to get a permutation of a string using regex?

There is no easy way to get a permutation with a regex. ("aabc") to another order, without changing number or kind of letters. Regex: Regular expression. "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.

What is regex match all except a specific word?

Regex Match All Except a Specific Word, Character, or Pattern December 30, 2020 by Benjamin Regex is great for finding specific patterns, but can also be useful to match everything except an unwanted pattern. A regular expression that matches everything except a specific pattern or word makes use of a negative lookahead.

What are some examples of numbers in Perl 6 grammar?

For example, 1, 42, 123, or 1000. A grammar in Perl 6 is a special kind of classes with its own keywords. The first rule of the grammar must (by default) be called TOP, and here is the complete program that parses our first set of numbers:


2 Answers

Using a code assertion:

You could match any word, and then use a <!{ }> assertion to reject words that are permutations of "one" or "zero":

say "two dollar" ~~ / :s ^ (\w+) <!{ $0.comb.sort.join eq "eno" | "eorz" }> dollar $ /;

Using before/after:

Alternatively, you could pre-generate all permutations of the disallowed words, and then reject them using a <!before > or <!after > assertion in the regex:

my @disallowed = <one zero>.map(|*.comb.permutations)».join.unique;

say "two dollar" ~~ / :s ^ <!before @disallowed>\w+ dollar $ /;
say "two dollar" ~~ / :s ^ \w+<!after @disallowed> dollar $ /;
like image 162
smls Avatar answered Oct 11 '22 13:10

smls


Here's a solution that works well. It uses a helper-sub is-bad-word that compares the $needle (i.e. what it found in the target string) against the @badwords and if any matches, it'll return True.

Inside the regex itself, I've used a negative code-assertion that passes the (\w+) that was matched into the helper sub.

One important thing to point out: If you don't properly anchor the (\w+) to the beginning of a word (i chose beginning of the string this time) it will just skip ahead one character when it found a bad word and accept anyway (unless the bad word was only one character to begin with, like in a dollar). After all, zero is in your @badwords, but ero isn't.

Hope that helps!

my @badwords = <one zero yellow>;

my @parsefails = q:to/EOF/.lines;
    zero dollar
    roze dollar
    erzo dollar
    one dollar
    noe dollar
    oen dollar
    yellow dollar
    wolley dollar
    EOF

my @parsepasses = q:to/EOF/.lines;
    thousand dollar
    million dollar
    dog dollar
    top dollar
    meme dollar
    EOF

sub is-bad-word($needle) {
    return $needle.comb.sort eq any(@badwords).comb.sort
}

use Test;
plan @parsefails + @parsepasses;

for flat (@parsefails X False), (@parsepasses X True) -> $line, $should-pass {
    my $succ = so $line ~~ / ^ (\w+) \s <!{ is-bad-word($0.Str) }> 'dollar' /;
    ok $succ eqv $should-pass, "$line -> $should-pass";
}

done-testing;
like image 38
timotimo Avatar answered Oct 11 '22 13:10

timotimo