I am tring to extract the domain names out of a list of URLs. Just like in https://stackoverflow.com/questions/18331948/extract-domain-name-from-the-url My problem is that the URLs can be about everything, few examples: <code>m.google.com</code> => <code>google</code> <code>m.docs.google.com</code> => <code>google</code> <code>www.someisotericdomain.innersite.mall.co.uk</code> => <code>mall</code> <code>www.ouruniversity.department.mit.ac.us</code> => <code>mit</code> <code>www.somestrangeurl.shops.relevantdomain.net</code> => <code>relevantdomain</code> <code>www.example.info</code> => <code>example</code> And so on.. The diversity of the domains doesn't allow me to use a regex as shown in how to get domain name from URL (because my script will be running on enormous amount of urls from real network traffic, the regex will have to be enormous in order to catch all kinds of domains as mentioned). Unfortunately my web research the didn't provide any efficient solution. Does anyone have an idea of how to do this ? Any help will be appreciated ! Thank you

Use <code>tldextract</code> which is more efficient version of <code>urlparse</code>, <code>tldextract</code> accurately separates the <code>gTLD</code> or <code>ccTLD</code> (generic or country code top-level domain) from the registered <code>domain</code> and <code>subdomains</code> of a URL. <pre class="prettyprint"><code>>>> import tldextract >>> ext = tldextract.extract('http://forums.news.cnn.com/') ExtractResult(subdomain='forums.news', domain='cnn', suffix='com') >>> ext.domain 'cnn' </code></pre>

It seems you can use urlparse https://docs.python.org/3/library/urllib.parse.html for that url, and then extract the netloc. And from the netloc you could easily extract the domain name by using split

Simple solution via regex <pre class="prettyprint lang-py prettyprint-override"><code>import re def domain_name(url): return url.split("www.")[-1].split("//")[-1].split(".")[0] </code></pre>

Extract domain name from URL in Python

Tags:

python

regex

url

package

server

I am tring to extract the domain names out of a list of URLs. Just like in https://stackoverflow.com/questions/18331948/extract-domain-name-from-the-url
My problem is that the URLs can be about everything, few examples:
m.google.com => google
m.docs.google.com => google
www.someisotericdomain.innersite.mall.co.uk => mall
www.ouruniversity.department.mit.ac.us => mit
www.somestrangeurl.shops.relevantdomain.net => relevantdomain
www.example.info => example
And so on..
The diversity of the domains doesn't allow me to use a regex as shown in how to get domain name from URL (because my script will be running on enormous amount of urls from real network traffic, the regex will have to be enormous in order to catch all kinds of domains as mentioned).
Unfortunately my web research the didn't provide any efficient solution.
Does anyone have an idea of how to do this ?
Any help will be appreciated !
Thank you

864

asked May 17 '17 10:05

kobibo

3 Answers

Use tldextract which is more efficient version of urlparse, tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and subdomains of a URL.

>>> import tldextract
>>> ext = tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
>>> ext.domain
'cnn'

answered Oct 10 '22 20:10

akash karothiya

It seems you can use urlparse https://docs.python.org/3/library/urllib.parse.html for that url, and then extract the netloc.

And from the netloc you could easily extract the domain name by using split

answered Oct 10 '22 22:10

Mariano Anaya

Simple solution via regex

import re

def domain_name(url):
    return url.split("www.")[-1].split("//")[-1].split(".")[0]

answered Oct 10 '22 21:10

Sharif O

Related questions
                            
                                Django 2.0 - Not a valid view function or pattern name (Customizing Auth views)
                            
                                How to dynamically define functions?
                            
                                calling a function from class in python - different way
                            
                                Fitness proportionate selection (roulette wheel selection) in Python
                            
                                Print number in engineering format
                            
                                Simplest way to plot 3d surface given 3d points
                            
                                python for increment inner loop
                            
                                Numpy inverse mask
                            
                                What is the easiest way to achieve realtime plotting in pyqtgraph
                            
                                Using unittest.mock to patch input() in Python 3
                            
                                Python: execute cat subprocess in parallel
                            
                                ImportError: No module named flask.ext.httpauth
                            
                                Python writing a csv to a list of dictionaries with headers as keys and rows as values
                            
                                Selecting pandas cells with None value
                            
                                Sending a html email in Django [duplicate]
                            
                                Display data streamed from a Flask view as it updates
                            
                                How to check if re.sub() has successfully replaced in python? [duplicate]
                            
                                Smoothing Edges of a Binary Image
                            
                                Is it pythonic: naming lambdas
                            
                                Pandas: adding column with the length of other column as value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With