Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to match entire words containing only certain characters

Tags:

regex

I want to match entire words (or strings really) that containing only defined characters.

For example if the letters are d, o, g:

dog = match
god = match
ogd = match
dogs = no match (because the string also has an "s" which is not defined)
gods = no match
doog = match
gd = match

In this sentence:

dog god ogd, dogs o

...I would expect to match on dog, god, and o (not ogd, because of the comma or dogs due to the s)

like image 332
user1179784 Avatar asked May 23 '12 03:05

user1179784


3 Answers

This should work for you

\b[dog]+\b(?![,])

Explanation

r"""
\b        # Assert position at a word boundary
[dog]     # Match a single character present in the list “dog”
   +         # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b        # Assert position at a word boundary
(?!       # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
   [,]       # Match the character “,”
)
"""
like image 96
Narendra Yadala Avatar answered Sep 21 '22 22:09

Narendra Yadala


The following regex represents one or more occurrences of the three characters you're looking for:

[dog]+

Explanation:

The square brackets mean: "any of the enclosed characters".

The plus sign means: "one or more occurrences of the previous expression"

This would be the exact same thing:

[ogd]+
like image 28
jahroy Avatar answered Sep 21 '22 22:09

jahroy


Which regex flavor/tool are you using? (e.g. JavaScript, .NET, Notepad++, etc.) If it's one that supports lookahead and lookbehind, you can do this:

(?<!\S)[dog]+(?!\S)

This way, you'll only get matches that are either at the beginning of the string or preceded by whitespace, or at the end of the string or followed by whitespace. If you can't use lookbehind (for example, if you're using JavaScript) you can spell out the leading condition:

(?:^|\s)([dog]+)(?!\S)

In this case you would retrieve the matched word from group #1. But don't take the next step and try to replace the lookahead with (?:$|\s). If you did that, the first hit ("dog") would consume the trailing space, and the regex wouldn't be able to use it to match the next word ("god").

like image 32
Alan Moore Avatar answered Sep 21 '22 22:09

Alan Moore