Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find email with more than 2 dots using REGEXP MySQL function?

Tags:

regex

email

mysql

I have a big user database with a lot of spammy email account, like [email protected] or [email protected] and I want to remove them with one queries (instead of doing a php script that parses all the database).

I see MySQL have a REGEXP function, but it's hard to use it with "dot". I tried lots of things like :

  • SELECT * FROM users WHERE email REGEXP '([[...]]){2,}'
  • SELECT * FROM users WHERE email REGEXP '(\\.){2,}'
  • SELECT * FROM users WHERE email REGEXP '(.*)[[...]]{2}(.*)'

Each one fails.

Any ideas ?

like image 244
j0k Avatar asked Mar 18 '12 22:03

j0k


People also ask

How does regexp work in SQL?

REGEXP is the operator used when performing regular expression pattern matches. RLIKE is the synonym. It also supports a number of metacharacters which allow more flexibility and control when performing pattern matching. The backslash is used as an escape character.

What is the best type of query for validating the format of an email in MySQL?

What is the best query for validating the format of an email address in MySQL table? You can use a pure SELECT to validate Email Addresses: SELECT * FROM `users` WHERE `email` NOT REGEXP '^[^@]+@[^@]+\.

What is the difference between regexp and like in MySQL?

Basically, LIKE does very simple wildcard matches, and REGEX is capable of very complicated wildcard matches. In fact, regular expressions ( REGEX ) are so capable that they are [1] a whole study in themselves [2] an easy way to introduce very subtle bugs.


2 Answers

how about this?

SELECT * FROM users WHERE email REGEXP '[.].*[.].*@';

this searches for two (or more) dots before @ (which would still allow multi-dot domain names like .co.uk etc).

However, you should consider the possibility that some users may really have multiple dots in their username as well, and that will still be legit.

like image 114
poncha Avatar answered Oct 21 '22 05:10

poncha


As far as I've seen, this is usually only a spam issue when a gmail.com account is used. Most legitimate usernames with multiple dots will be something like [email protected] - so spammers who use 3 or more dots in their username - I will be pruning from my database, and I will be implementing code to the registration process to disallow the use of more than 2 dots in the username when a gmail address is used.

SELECT user_email FROM users WHERE email REGEXP '[.].*[.].*[.].*@';
like image 32
krnlpanic Avatar answered Oct 21 '22 04:10

krnlpanic