Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract top-level domain name (TLD) from URL

how would you extract the domain name from a URL, excluding any subdomains?

My initial simplistic attempt was:

'.'.join(urlparse.urlparse(url).netloc.split('.')[-2:]) 

This works for http://www.foo.com, but not http://www.foo.com.au. Is there a way to do this properly without using special knowledge about valid TLDs (Top Level Domains) or country codes (because they change).

thanks

like image 904
hoju Avatar asked Jul 01 '09 01:07

hoju


People also ask

Is TLD a top-level domain?

A TLD (top-level domain) is the most generic domain in the Internet's hierarchical DNS (domain name system). A TLD is the final component of a domain name, for example, "org" in developer.mozilla.org . ICANN (Internet Corporation for Assigned Names and Numbers) designates organizations to manage each TLD.

Can you pick any top-level domain?

A top-level domain is an integral part of your website's structure. Before you buy one, take the time to look at the different TLD options and select the best one to represent your business online. If you need to change your TLD or domain name later, you can.


2 Answers

Here's a great python module someone wrote to solve this problem after seeing this question: https://github.com/john-kurkowski/tldextract

The module looks up TLDs in the Public Suffix List, mantained by Mozilla volunteers

Quote:

tldextract on the other hand knows what all gTLDs [Generic Top-Level Domains] and ccTLDs [Country Code Top-Level Domains] look like by looking up the currently living ones according to the Public Suffix List. So, given a URL, it knows its subdomain from its domain, and its domain from its country code.

like image 136
Acorn Avatar answered Oct 05 '22 04:10

Acorn


No, there is no "intrinsic" way of knowing that (e.g.) zap.co.it is a subdomain (because Italy's registrar DOES sell domains such as co.it) while zap.co.uk isn't (because the UK's registrar DOESN'T sell domains such as co.uk, but only like zap.co.uk).

You'll just have to use an auxiliary table (or online source) to tell you which TLD's behave peculiarly like UK's and Australia's -- there's no way of divining that from just staring at the string without such extra semantic knowledge (of course it can change eventually, but if you can find a good online source that source will also change accordingly, one hopes!-).

like image 43
Alex Martelli Avatar answered Oct 05 '22 05:10

Alex Martelli