Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to strip a variable number of periods from username in email address?

I'm cleaning up spam accounts in my forum, and found a bunch of email addresses with the following format:

[email protected]
[email protected]
[email protected]

Gmail treats these all as the same email account, versus the forum software treats them as distinct email addresses, so spammers use this trick to re-use the same email address again and again when creating spam forum accounts.

In order to identify them, I need to strip out all the periods before the @gmail.com. Then it's easy to identify all the duplicate accounts.

Fortunately, MariaDB 10 has a new REGEXP_REPLACE function designed for exactly these types of problems. Unfortunately, I can't figure out the correct regex.

My primary stumbling block is the number of periods varies drastically, and I'm not sure how to write a regex when the number of items will vary randomly throughout the string. I've found as many as 8 periods in one of these email addresses, totally random where in the string they'll show up.

It'd be easy if I could just strip out all periods but I can't because I need the @gmail.com to stay untouched. Additionally this regex should only match on @gmail.com addresses and ignore other email providers.

How do I do this?

like image 595
Jeff Widman Avatar asked Dec 28 '14 07:12

Jeff Widman


Video Answer


2 Answers

There's another trick with gmail addresses: Any text after a + character is ignored, so e.g. [email protected] and [email protected] are effectively the same address.

You can use this pattern to remove all text after a + character, as well as all dots (shamelessly based on Raj's pattern, please don't hate me):

(?:\.|\+.*)(?=.*?@gmail\.com)

(replace with the empty string)

regex101 demo.

like image 154
Aran-Fey Avatar answered Oct 19 '22 05:10

Aran-Fey


Use positive lookahead assertion to match all the dots which are present before to the @gmail.com

\.(?=.*?@gmail\.com)

Then replace the matched dots with an empty string.

DEMO

like image 29
Avinash Raj Avatar answered Oct 19 '22 07:10

Avinash Raj