Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex lookahead discard a match

Tags:

c#

regex

I am trying to make a regex match which is discarding the lookahead completely.

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

This is the match and this is my regex101 test.

But when an email starts with - or _ or . it should not match it completely, not just remove the initial symbols. Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.

like image 855
dev Avatar asked Sep 27 '22 23:09

dev


2 Answers

You can use the word boundary near @ with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\s\-_.]:

(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b@\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*

See demo

List of matches:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Additional notes on usage and alternative notation

Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\s\-_.] can be written as [^\s_.-], with the hyphen at the end of the character class still denoting a literal hyphen, not a range. Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\s|^) with the equivalent (?<!\S). See this regex:

(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b@\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*

And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\S)/(?<=\s|^) with a (non)capturing group (\s|^), wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents:

(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b@\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)

See the regex demo.

like image 101
Wiktor Stribiżew Avatar answered Nov 11 '22 04:11

Wiktor Stribiżew


I use this for multiple email addresses, separate with ‘;':

([A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*

For a single mail:

[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}
like image 25
Piero Alberto Avatar answered Nov 11 '22 05:11

Piero Alberto