I am trying to match any word that is not completely composed of capitals or lowercase letters, and I have the following regex written:
if ($line =~ /(?!^[A-Z][A-Z]+(\s*)$)(?!^[a-z][a-z]+(\s*)$)/) {
print $line;
}
The expression below should match words with all capital letters
(?!^[A-Z][A-Z]+(\s*)$)
and this should match words with all lowercase letters
(?!^[a-z][a-z]+(\s*)$)
I combine both and try to match this with the following words, ASDSFSDF, asdfasdfasdf, and asdasdfFFFdsfs. I notice that it is matching everything. only when i move the caret outside the brackets as in:
^(?![A-Z][A-Z]+(\s*)$)^(?![a-z][a-z]+(\s*)$)/)
do i see that its only maching the asdasdfFFFdsfs. can someone explain to me why i need to move the operator outside of the negative lookahead expression? i am new to regexp and i am confused.
Thanks.
You fell in a trap of multiple negations and anchoring, and you resulting regex didn't quite do what you want. Let's assume we only have the simplified regex /(?!^[A-Z]$)/
and the string "1"
.
At the first position (before the 1
), the assertion is tested. The ^
matches here, but [A-Z]
does not. Therefore, ^[A-Z]
fails. As the lookahead is negative, the whole pattern succeeds.
Now let's assume we have the string "A"
. At the first position, the assertion is tested. The pattern ^[A-Z]$
matches here. Because it is a negative lookahead, the assertion fails.
Then, the second position is tested (after the A
). The assertion is tested, but ^
doesn't match here – thus the negative assertion makes the pattern succeed!
Therefore, your regex doesn't match the patterns you wanted. You can suppress this behaviour by anchoring outside the assertion:
/^(?![A-Z]$)/
in this case. Note that in your case, the easiest solution is to write a regex that matches all inputs you don't want, and the negating that result:
print $line unless $line =~ /^(?:[A-Z]{2,}|[a-z]{2,})\s*$/;
(Edit: actually TLP's 2nd solution is even simpler, and likely more efficient)
How about just checking the string for the upper and lower case characters?
(?=.*[A-Z])(?=.*[a-z])
As you see, this will not match strings consisting of only one case, because both lookaheads must match.
Of course, this is just a complicated way of performing two regex matches and combining the result:
if ($line =~ /[A-Z]/ and $line =~ /[a-z]/)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With