Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract domain from URL in python [duplicate]

Tags:

python

url

I have an url like:
http://abc.hostname.com/somethings/anything/

I want to get:
hostname.com

What module can I use to accomplish this?
I want to use the same module and method in python2.

like image 433
Amit Avatar asked May 22 '17 12:05

Amit


People also ask

How do I find the domain of a URL in Python?

To get the domain from a URL in Python, the easiest way is to use the urllib. parse module urlparse() function and access the netloc attribute. When working with URLs in Python, the ability to easily extract information about those URLs can be very valuable.

How do I remove a domain from URL in Python?

We split first by the http:// to remove that from the string. Then we split by the / to remove all directory or sub-directory parts of the string, and then the [-2] means we take the second last token after a . , and append it with the last token, to give us the top level domain.

How do I extract a URL from text in Python?

URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.


1 Answers

For parsing the domain of a URL in Python 3, you can use:

from urllib.parse import urlparse  domain = urlparse('http://www.example.test/foo/bar').netloc print(domain) # --> www.example.test 

However, for reliably parsing the top-level domain (example.test in this example), you need to install a specialized library (e.g., tldextract).

like image 69
Philipp Claßen Avatar answered Sep 30 '22 04:09

Philipp Claßen