Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 2 and 3 extract domain from url

I have an url like: http://xxx.abcdef.com/fdfdf/

And I want to get xxx.abcdef.com

Which module can i use for accomplish this?

I want to use the same module and method at python2 and python3

I don't like the try except way for python2/3 compatibility

Thanks you so much!

like image 479
fj123x Avatar asked Feb 04 '14 21:02

fj123x


People also ask

How do I get the domain of a URL in Python?

To get the domain from a URL in Python, the easiest way is to use the urllib. parse module urlparse() function and access the netloc attribute. When working with URLs in Python, the ability to easily extract information about those URLs can be very valuable.

How do I remove a domain from URL in Python?

We split first by the http:// to remove that from the string. Then we split by the / to remove all directory or sub-directory parts of the string, and then the [-2] means we take the second last token after a . , and append it with the last token, to give us the top level domain.

What is Netloc in Python?

netloc : Contains the network location - which includes the domain itself (and subdomain if present), the port number, along with an optional credentials in form of username:password . Together it may take form of username:[email protected]:80 .


2 Answers

Use urlparse:

from urlparse import urlparse
o = urlparse("http://xxx.abcdef.com/fdfdf/")
print o

print o.netloc

In Python 3, you import urlparse like so:

from urllib.parse import urlparse

Alternatively, just use str.split():

url = "http://xxx.abcdef.com/fdfdf/"

print url.split('/')[2]

Sidenote: Here's how you write an import of urlparse that will work in either version:

if sys.version_info >= (3, 0):
    from urllib.parse import urlparse
if sys.version_info < (3, 0) and sys.version_info >= (2, 5):
    from urlparse import urlparse
like image 154
jgritty Avatar answered Oct 28 '22 21:10

jgritty


You can use 3rd party library six, which takes care of compatibility issues between python versions and standard library function urlparse to extract the hostname

so all you need to do is install six and import urlparse

from six.moves.urllib.parse import urlparse
u = urlparse("http://xxx.abcdef.com/fdfdf/")
print(u.hostname)

More on urlparse here

like image 44
swapnil jariwala Avatar answered Oct 28 '22 19:10

swapnil jariwala