What is the best way to perform, inside a regex, negation of multiple words and permutations of chars that make up those words?
For instance: I do not want
"zero dollar"
"roze dollar"
"eroz dollar"
"one dollar"
"noe dollar"
"oen dollar"
but I do want
"thousand dollar"
"million dollar"
"trillion dollar"
If I write
not m/ [one | zero] \s dollar /
it will not match permutations of chars, and the "not" function outside will make the regex match everything else like "big bang" without the "dollar" in the regex.
m/ <- [one] | [zero] > \s dollar/ # this is syntax error.
Thank you very much !
lisprog
The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as sed, grep, and awk. The basic method for applying a regular expression is to use the pattern binding operators =~ and ! ~.
There is no easy way to get a permutation with a regex. ("aabc") to another order, without changing number or kind of letters. Regex: Regular expression. "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.
Regex Match All Except a Specific Word, Character, or Pattern December 30, 2020 by Benjamin Regex is great for finding specific patterns, but can also be useful to match everything except an unwanted pattern. A regular expression that matches everything except a specific pattern or word makes use of a negative lookahead.
For example, 1, 42, 123, or 1000. A grammar in Perl 6 is a special kind of classes with its own keywords. The first rule of the grammar must (by default) be called TOP, and here is the complete program that parses our first set of numbers:
You could match any word, and then use a <!{ }>
assertion to reject words that are permutations of "one" or "zero":
say "two dollar" ~~ / :s ^ (\w+) <!{ $0.comb.sort.join eq "eno" | "eorz" }> dollar $ /;
before
/after
:Alternatively, you could pre-generate all permutations of the disallowed words, and then reject them using a <!before >
or <!after >
assertion in the regex:
my @disallowed = <one zero>.map(|*.comb.permutations)».join.unique;
say "two dollar" ~~ / :s ^ <!before @disallowed>\w+ dollar $ /;
say "two dollar" ~~ / :s ^ \w+<!after @disallowed> dollar $ /;
Here's a solution that works well. It uses a helper-sub is-bad-word
that compares the $needle
(i.e. what it found in the target string) against the @badwords
and if any
matches, it'll return True.
Inside the regex itself, I've used a negative code-assertion that passes the (\w+)
that was matched into the helper sub.
One important thing to point out: If you don't properly anchor the (\w+)
to the beginning of a word (i chose beginning of the string this time) it will just skip ahead one character when it found a bad word and accept anyway (unless the bad word was only one character to begin with, like in a dollar
). After all, zero is in your @badwords
, but ero
isn't.
Hope that helps!
my @badwords = <one zero yellow>;
my @parsefails = q:to/EOF/.lines;
zero dollar
roze dollar
erzo dollar
one dollar
noe dollar
oen dollar
yellow dollar
wolley dollar
EOF
my @parsepasses = q:to/EOF/.lines;
thousand dollar
million dollar
dog dollar
top dollar
meme dollar
EOF
sub is-bad-word($needle) {
return $needle.comb.sort eq any(@badwords).comb.sort
}
use Test;
plan @parsefails + @parsepasses;
for flat (@parsefails X False), (@parsepasses X True) -> $line, $should-pass {
my $succ = so $line ~~ / ^ (\w+) \s <!{ is-bad-word($0.Str) }> 'dollar' /;
ok $succ eqv $should-pass, "$line -> $should-pass";
}
done-testing;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With