I have a large number of email addresses to validate. Initially I parse them with a regexp to throw out the completely crazy ones. I'm left with the ones that look sensible but still might contain errors.
I want to find which addresses have valid domains, so given [email protected] I want to know if it's even possible to send emails to abcxyz.com .
I want to test that to see if it corresponds to a valid A or MX record - is there an easy way to do it using only Python standard library? I'd rather not add an additional dependency to my project just to support this feature.
There is no DNS interface in the standard library so you will either have to roll your own or use a third party library.
This is not a fast-changing concept though, so the external libraries are stable and well tested.
The one I've used successful for the same task as your question is PyDNS.
A very rough sketch of my code is something like this:
import DNS, smtplib
DNS.DiscoverNameServers()
mx_hosts = DNS.mxlookup(hostname)
# Just doing the mxlookup might be enough for you,
# but do something like this to test for SMTP server
for mx in mx_hosts:
smtp = smtplib.SMTP()
#.. if this doesn't raise an exception it is a valid MX host...
try:
smtp.connect(mx[1])
except smtplib.SMTPConnectError:
continue # try the next MX server in list
Another library that might be better/faster than PyDNS is dnsmodule although it looks like it hasn't had any activity since 2002, compared to PyDNS last update in August 2008.
Edit: I would also like to point out that email addresses can't be easily parsed with a regexp. You are better off using the parseaddr() function in the standard library email.utils module (see my answer to this question for example).
The easy way to do this NOT in the standard library is to use the validate_email package:
from validate_email import validate_email
is_valid = validate_email('[email protected]', check_mx=True)
For faster results to process a large number of email addresses (e.g. list emails
, you could stash the domains and only do a check_mx if the domain isn't there. Something like:
emails = ["[email protected]", "email@bad_domain", "[email protected]", ...]
verified_domains = set()
for email in emails:
domain = email.split("@")[-1]
domain_verified = domain in verified_domains
is_valid = validate_email(email, check_mx=not domain_verified)
if is_valid:
verified_domains.add(domain)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With