Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegExp exclusion, looking for a word not followed by another

Tags:

regex

I am trying to search for all occurrences of "Tom" which are not followed by "Thumb".

I have tried to look for

Tom ^((?!Thumb).)*$ 

but I still get the lines that match to Tom Thumb.

like image 911
user1364539 Avatar asked Apr 29 '12 19:04

user1364539


People also ask

How do you exclude a word in regex?

If you want to exclude a certain word/string in a search pattern, a good way to do this is regular expression assertion function. It is indispensable if you want to match something not followed by something else. ?= is positive lookahead and ?! is negative lookahead.

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

What does \+ mean in regex?

Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol. Example: "^a" matches "a" at the start of the string. Example: "[^0-9]" matches any non digit. $ the dollar sign is the anchor for the end of the string.

How do you match anything up until this sequence of characters in regular expression?

If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the result, until it meets either an a , or b , or c . For example, with the source string "qwerty qwerty whatever abc hello" , the expression will match up to "qwerty qwerty wh" .


2 Answers

In case you are not looking for whole words, you can use the following regex:

Tom(?!.*Thumb) 

If there are more words to check after a wanted match, you may use

Tom(?!.*(?:Thumb|Finger|more words here)) Tom(?!.*Thumb)(?!.*Finger)(?!.*more words here) 

To make . match line breaks please refer to How do I match any character across multiple lines in a regular expression?

See this regex demo

If you are looking for whole words (i.e. a whole word Tom should only be matched if there is no whole word Thumb further to the right of it), use

\bTom\b(?!.*\bThumb\b) 

See another regex demo

Note that:

  • \b - matches a leading/trailing word boundary
  • (?!.*Thumb) - is a negative lookahead that fails the match if there are any 0+ characters (depending on the engine including/excluding linebreak symbols) followed with Thumb.
like image 39
Wiktor Stribiżew Avatar answered Oct 02 '22 10:10

Wiktor Stribiżew


You don't say what flavor of regex you're using, but this should work in general:

 Tom(?!\s+Thumb) 
like image 137
alan Avatar answered Oct 02 '22 12:10

alan