Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - Match words in pattern, except within email address

Tags:

I'm looking to find words in a string that match a specific pattern. Problem is, if the words are part of an email address, they should be ignored.

To simplify, the pattern of the "proper words" \w+\.\w+ - one or more characters, an actual period, and another series of characters.

The sentence that causes problem, for example, is a.a b.b:c.c [email protected].

The goal is to match only [a.a, b.b, c.c] . With most Regexes I build, e.e returns as well (because I use some word boundary match).

For example:

>>> re.findall(r"(?:^|\s|\W)(?<!@)(\w+\.\w+)(?!@)\b", "a.a b.b:c.c [email protected]") ['a.a', 'b.b', 'c.c', 'e.e']

How can I match only among words that do not contain "@"?

like image 285
alon Avatar asked Aug 01 '17 15:08

alon


People also ask

How do you exclude words in regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does (? I do in regex?

(? i) makes the regex case insensitive. (? c) makes the regex case sensitive.

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


1 Answers

I would definitely clean it up first and simplify the regex.

first we have

words = re.split(r':|\s', "a.a b.b:c.c [email protected]")

then filter out the words that have an @ in them.

words = [re.search(r'^((?!@).)*$', word) for word in words]
like image 160
Cory Madden Avatar answered Sep 20 '22 14:09

Cory Madden