Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx: Find pattern but exclude words

Tags:

regex

I want to find all connected words except for specific ones. For example:

0827banana82/+wine22green-729

green and wine should match, but banana not.

I tried the following regular expression with a negative lookahead:

(?!banana)([a-zA-Z]+)

but it excludes only the first letter of banana because anana is still a match for the second pattern. I have no idea how to get rid of that.

like image 365
YoungMath Avatar asked Jun 11 '21 08:06

YoungMath


People also ask

How do you exclude a word in regex?

If you want to exclude a certain word/string in a search pattern, a good way to do this is regular expression assertion function. It is indispensable if you want to match something not followed by something else. ?= is positive lookahead and ?! is negative lookahead.

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

What does \+ mean in regex?

Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol.

How to exclude a word/string in a search pattern?

Regular Expression: exclude a word/string. If you want to exclude a certain word/string in a search pattern, a good way to do this is regular expression assertion function. It is indispensable if you want to match something not followed by something else.

What is regex match all except a specific word?

Regex Match All Except a Specific Word, Character, or Pattern December 30, 2020 by Benjamin Regex is great for finding specific patterns, but can also be useful to match everything except an unwanted pattern. A regular expression that matches everything except a specific pattern or word makes use of a negative lookahead.

How to avoid a single word in a regex?

Use the below regex to avoid one single word. Using [banana] does not do what you think that is does. It is a character class matching one of the listed characters and is the same as [bna] You need to use the word boundary \b expression \b. Banana apple will be excluded from your match.

How do I exclude a character in regex?

If the character you want to exclude is a reserved character in regex (such as ? or *) you need to include a backslash \ in front of the character to escape it, as shown: /^(?!.*\?).*/


4 Answers

You may add a negative lookbehind in your regex to make it work:

(?!banana)(?<![a-zA-Z])[a-zA-Z]+

RegEx Demo

RegEx Details:

  • (?!banana): Negative lookahead to assert that we don't have string banana ahead of the current position
  • (?<![a-zA-Z]): Negative lookbehind to assert that we don't have a letter before current position
  • [a-zA-Z]+: Match 1+ letters

PS: If you want to allow words like bananas then use:

(?!banana(?![a-zA-Z]))(?<![a-zA-Z])[a-zA-Z]+
like image 136
anubhava Avatar answered Nov 15 '22 08:11

anubhava


Well you can use this one:

(banana)|([a-zA-Z]+)

Which will capture banana in 1st group and all the other words in 2nd.

like image 45
zipa Avatar answered Nov 15 '22 08:11

zipa


Another variation might be matching the characters a-zA-Z until there are no more. Then assert that banana is not directly to the left.

[a-zA-Z]+(?![a-zA-Z])(?<!banana)

The pattern matches

  • [a-zA-Z]+ Match 1+ chars a-zA-Z
  • (?![a-zA-Z]) Negative lookahead, assert not a-zA-Z directly to the right
  • (?<!banana) Negative lookbehind, assert banana not directly to the left

Regex demo


If you want to match bananas or straigtbanana you can assert that on the left is not banana preceded by a char a-zA-Z

[a-zA-Z]+(?![a-zA-Z])(?<!(?<![a-zA-Z])banana)

Regex demo


As suggested by @bobble bubble in the comments, if possessive quantifiers are supported and shortening the pattern using a case insensitive match:

[a-z]++(?<!(?<![a-z])banana)
  • [a-z]++ Match 1+ chars in the range of a-z (possessive, do not backtrack)
  • (?<! Negative lookbehind, assert what is directly to the left is not
    • (?<![a-z])banana Negative lookbehind, match banana not preceded by a-z
  • ) Close lookbedhind

Regex demo

like image 38
The fourth bird Avatar answered Nov 15 '22 08:11

The fourth bird


My two cents, assuming you do want to match words like "bananas":

(\b|\d)(?:banana|([a-zA-Z]+))(?1)

Your matches are in group 2, see an online demo

  • (\b|\d) - A 1st capture group to hold a word-boundary or a digit.
  • (?:banana|([a-zA-Z]+)) - A non-capture group with the alternation of either exactly "banana" or a 2nd capture group of 1+ alpha-chars.
  • (?1) - Repeat the subpattern of the 1st capture group.

EDIT: If the backreference is not supported, you can try

(?:\b|\d)(?:banana|([a-zA-Z]+))(?:\b|\d)

Or, using lookarounds:

(?i)(?<![a-z])(?:banana|([a-z]+))(?![a-z])
like image 38
JvdV Avatar answered Nov 15 '22 06:11

JvdV