Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match a pattern, but exclude a set of words

Tags:

regex

I have been looking through SO and although this question has been answered in one scenario:

Regex to match all words except a given list

It's not quite what I'm looking for. I am trying to write a regular expression which matches any string of the form [\w]+[(], but which doesn't match the three strings "cat(", "dog(" and "sheep(" specifically.

I have been playing with lookahead and lookbehind, but I can't quite get there. I may be overcomplicating this, so any help would be greatly appreciated.

like image 491
Huguenot Avatar asked Jul 23 '09 16:07

Huguenot


People also ask

How do you exclude a word in regex?

If you want to exclude a certain word/string in a search pattern, a good way to do this is regular expression assertion function. It is indispensable if you want to match something not followed by something else. ?= is positive lookahead and ?! is negative lookahead.

What does \+ mean in regex?

Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol. Example: "^a" matches "a" at the start of the string.

What does \b mean in regular expressions?

Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.


2 Answers

If the regular expression implementation supports look-ahead or look-behind assertions, you could use the following:

  • Using a negative look-ahead assertion:

     \b(?!(?:cat|dog|sheep)\()\w+\(
    
  • Using a negative look-behind assertion:

     \b\w+\((?<!\b(?:cat|dog|sheep)\()
    

I added the \b anchor that marks a word boundary. So catdog( would be matched although it contains dog(.

But while look-ahead assertions are more widely supported by regex implementations, the regex with the look-behind assertion is more efficient since it’s only tested if the preceding regex (in our case \b\w+\() already did match. However the look-ahead assertion would be tested before the actual regex would match. So in our case the look-ahead assertion is tested whenever \b is matched.

like image 189
Gumbo Avatar answered Sep 30 '22 18:09

Gumbo


Do you really require this in a single regex? If not, then the simplest implementation is just two regexes - one to check you don't match one of your forbidden words, and one to match your \w+, chained with a logical AND.

like image 43
ire_and_curses Avatar answered Sep 30 '22 16:09

ire_and_curses