I have been looking through SO and although this question has been answered in one scenario:
Regex to match all words except a given list
It's not quite what I'm looking for. I am trying to write a regular expression which matches any string of the form [\w]+[(], but which doesn't match the three strings "cat(", "dog(" and "sheep(" specifically.
I have been playing with lookahead and lookbehind, but I can't quite get there. I may be overcomplicating this, so any help would be greatly appreciated.
If you want to exclude a certain word/string in a search pattern, a good way to do this is regular expression assertion function. It is indispensable if you want to match something not followed by something else. ?= is positive lookahead and ?! is negative lookahead.
Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol. Example: "^a" matches "a" at the start of the string.
Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.
If the regular expression implementation supports look-ahead or look-behind assertions, you could use the following:
Using a negative look-ahead assertion:
\b(?!(?:cat|dog|sheep)\()\w+\(
Using a negative look-behind assertion:
\b\w+\((?<!\b(?:cat|dog|sheep)\()
I added the \b
anchor that marks a word boundary. So catdog(
would be matched although it contains dog(
.
But while look-ahead assertions are more widely supported by regex implementations, the regex with the look-behind assertion is more efficient since it’s only tested if the preceding regex (in our case \b\w+\(
) already did match. However the look-ahead assertion would be tested before the actual regex would match. So in our case the look-ahead assertion is tested whenever \b
is matched.
Do you really require this in a single regex? If not, then the simplest implementation is just two regexes - one to check you don't match one of your forbidden words, and one to match your \w+, chained with a logical AND.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With