I want to get the domain part from an email address, in Javascript. It's easy to extract the domain from an email like via split
: "[email protected]", which is example.com
.
However, emails also come in forms like "[email protected]", of which the domain is example.com.uk
, instead of subdomain1.example.com.uk
. The problem here is that subdomain1
can be mistakenly considered as part of the domain.
How do I do this reliably?
To extract the name from an email address, use the split() method to split the email on the @ symbol and access the first array element, e.g. email. split('@')[0] . The split method will return an array containing the two strings, where the first string stores the name. Copied!
That is really not a trivial problem as it might seem at first glance. Luckily there are libs that solves this, tld-extract is a popular choice which uses Mozilla's Public Suffix List (a volunteer based list). The usage is
var parser = require('tld-extract');
console.log( parser("www.google.com") );
console.log( parser("google.co.uk") );
/**
* >> { tld: 'com', domain: 'google.com', sub: 'www' }
* >> { tld: 'co.uk', domain: 'google.co.uk', sub: '' }
*/
To extract the server address part from email address first split by @
character like this
const email = "[email protected]"
const address = email.split('@').pop()
const domain = parser(address).domain
See more in depth discussion about the problem solution check the README of a similar python library.
tldextract on the other hand knows what all gTLDs and ccTLDs look like by looking up the currently living ones according to the Public Suffix List (PSL). So, given a URL, it knows its subdomain from its domain, and its domain from its country code.
Make sure to learn about the list on Public Suffix List website and understand it is based on volunteer work and might not be exhaustive at all time.
The Public Suffix List is a cross-vendor initiative to provide an accurate list of domain name suffixes, maintained by the hard work of Mozilla volunteers and by submissions from registries, to whom we are very grateful.
Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list. This is the aim of the Public Suffix List.
I agree that the best solution for this problem would be to use a library, like what was suggested in https://stackoverflow.com/a/49893282/2735286.
Yet if you have a long enough list with top level domains and subdomains, you could write some code which extracts whatever characters are found after the '@' sign and then from the domain you try to find out whether you have a top level or subdomain. When you know if you are dealing with a top level domain you know where you can find the main domain name and so everything before it must be a subdomain. The same applies to the subdomain.
This is a naive implementation, but you could try this:
// TODO: needs to have an exhaustive list of top level domains
const topLevelDomains = ["com", "org", "int", "gov", "edu", "net", "mil"];
// TODO: Needs an exhaustive list of subdomains
const subdomains = ["co.uk", "org.uk", "me.uk", "ltd.uk", "plc.uk"];
function extract(str) {
const suffix = str.match(/.+@(.+)/);
if (suffix) {
const groups = suffix.pop().split(".");
const lastPart = groups[groups.length - 1];
if (isSubDomain(groups[groups.length - 2] + "." + lastPart)) {
console.log("Sub domain detected in: " + groups);
if (groups.length > 3) {
console.log("Possible subdomain: " + groups.splice(0, groups.length - 3));
console.log();
}
} else if (isTopLevelDomain(lastPart)) {
console.log("Top level domain detected in: " + groups);
if (groups.length > 2) {
console.log("Possible subdomain: " + groups.splice(0, groups.length - 2));
console.log();
}
}
}
}
function isTopLevelDomain(lastPart) {
return (topLevelDomains.find(s => s === lastPart));
}
function isSubDomain(lastPart) {
return (subdomains.find(s => s === lastPart));
}
extract("[email protected]");
extract("[email protected]");
extract("[email protected]");
extract("[email protected]");
extract("[email protected]");
Please challenge the logic, if I got this wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With