how would you extract the domain name from a URL, excluding any subdomains?
My initial simplistic attempt was:
'.'.join(urlparse.urlparse(url).netloc.split('.')[-2:])
This works for http://www.foo.com, but not http://www.foo.com.au. Is there a way to do this properly without using special knowledge about valid TLDs (Top Level Domains) or country codes (because they change).
thanks
A TLD (top-level domain) is the most generic domain in the Internet's hierarchical DNS (domain name system). A TLD is the final component of a domain name, for example, "org" in developer.mozilla.org . ICANN (Internet Corporation for Assigned Names and Numbers) designates organizations to manage each TLD.
A top-level domain is an integral part of your website's structure. Before you buy one, take the time to look at the different TLD options and select the best one to represent your business online. If you need to change your TLD or domain name later, you can.
Here's a great python module someone wrote to solve this problem after seeing this question: https://github.com/john-kurkowski/tldextract
The module looks up TLDs in the Public Suffix List, mantained by Mozilla volunteers
Quote:
tldextract
on the other hand knows what all gTLDs [Generic Top-Level Domains] and ccTLDs [Country Code Top-Level Domains] look like by looking up the currently living ones according to the Public Suffix List. So, given a URL, it knows its subdomain from its domain, and its domain from its country code.
No, there is no "intrinsic" way of knowing that (e.g.) zap.co.it
is a subdomain (because Italy's registrar DOES sell domains such as co.it
) while zap.co.uk
isn't (because the UK's registrar DOESN'T sell domains such as co.uk
, but only like zap.co.uk
).
You'll just have to use an auxiliary table (or online source) to tell you which TLD's behave peculiarly like UK's and Australia's -- there's no way of divining that from just staring at the string without such extra semantic knowledge (of course it can change eventually, but if you can find a good online source that source will also change accordingly, one hopes!-).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With