Python urlparse -- extract domain name without subdomain

Tags:

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract "google.com" from a full url like "http://www.google.com".

The closest I can seem to come with urlparse is the netloc attribute, but that includes the subdomain, which in this example would be www.google.com.

I know that it is possible to write some custom string manipulation to turn www.google.com into google.com, but I want to avoid by-hand string transforms or regex in this task. (The reason for this is that I am not familiar enough with url formation rules to feel confident that I could consider every edge case required in writing a custom parsing function.)

Or, if urlparse can't do what I need, does anyone know any other Python url-parsing libraries that would?

748

asked Jan 18 '13 19:01

Clay Wardell

1 Answers

You probably want to check out tldextract, a library designed to do this kind of thing.

It uses the Public Suffix List to try and get a decent split based on known gTLDs, but do note that this is just a brute-force list, nothing special, so it can get out of date (although hopefully it's curated so as not to).

Click to copy

>>> import tldextract >>> tldextract.extract('http://forums.news.cnn.com/') ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')

So in your case:

Click to copy

>>> extracted = tldextract.extract('http://www.google.com') >>> "{}.{}".format(extracted.domain, extracted.suffix) "google.com"

answered Sep 16 '22 14:09

Gareth Latty

Related questions
                            
                                Why Python is so slow for a simple for loop?
                            
                                pdb.set_trace() causing frozen nosetests, does not drop into debugger
                            
                                Split function add: \xef\xbb\xbf...\n to my list
                            
                                How to link home brew python version and set it as default
                            
                                Run child processes as different user from a long running Python process
                            
                                Python Django Global Variables
                            
                                Convert structured array to regular NumPy array
                            
                                Remove line through marker in matplotlib legend
                            
                                %matplotlib line magic causes SyntaxError in Python script
                            
                                Automatically add newline on save in PyCharm?
                            
                                Equivalent of NotImplementedError for fields in Python
                            
                                Simulate autofit column in xslxwriter
                            
                                Parallelizing a Numpy vector operation
                            
                                convert a grayscale image to a 3-channel image [duplicate]
                            
                                python pandas dataframe slicing by date conditions
                            
                                Why does numpy.power return 0 for small exponents while math.pow returns the correct answer?
                            
                                Joining byte list with python
                            
                                Pillow in Python won't let me open image ("exceeds limit")
                            
                                Unicode identifiers in Python?
                            
                                Adding Custom Django Model Validation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python urlparse -- extract domain name without subdomain

Tags:

python

url

parsing

urlparse

Clay Wardell

People also ask

1 Answers

Gareth Latty

Recent Activity

Donate For Us