I have an url like:http://abc.hostname.com/somethings/anything/
I want to get:hostname.com
What module can I use to accomplish this?
I want to use the same module and method in python2.
To get the domain from a URL in Python, the easiest way is to use the urllib. parse module urlparse() function and access the netloc attribute. When working with URLs in Python, the ability to easily extract information about those URLs can be very valuable.
We split first by the http:// to remove that from the string. Then we split by the / to remove all directory or sub-directory parts of the string, and then the [-2] means we take the second last token after a . , and append it with the last token, to give us the top level domain.
URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.
For parsing the domain of a URL in Python 3, you can use:
from urllib.parse import urlparse domain = urlparse('http://www.example.test/foo/bar').netloc print(domain) # --> www.example.test
However, for reliably parsing the top-level domain (example.test
in this example), you need to install a specialized library (e.g., tldextract).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With