Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing string for Domain / hostName

Out customers can enter websites from domain names. They also can enter mailadresses from their contacts.

Know we need to find customers which websited whoose domain can be associated to the domains of the mailadresses.

So my idea is to extract the host from the webadress and from the url and compare them

So what's the most reliable algorithm to get the hostname from a url?

for example a host can be:

foo.com
www.foo.com
http://foo.com
https://foo.com
https://www.foo.com

The result should always be foo.com

like image 635
Boas Enkler Avatar asked May 24 '12 09:05

Boas Enkler


People also ask

What is a parsed domain name?

Splits a hostname into subdomains, domain and (effective) top-level domains. Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains.

How do I find the domain of a string?

First let's create a string with our URL (Note: If the URL isn't correctly structured you'll get an error). const url = 'https://www.michaelburrows.xyz/blog?search=hello&world'; Next we create a URL object using the new URL() constructor. let domain = (new URL(url));

How do I find the hostname of a domain?

Use ICANN Lookup Go to lookup.icann.org. In the search field, enter your domain name and click Lookup. In the results page, scroll down to Registrar Information. The registrar is usually your domain host.

Which command will extract the domain suffix com from the string string?

Domain names are used in URLs to identify specific webpages. For example, in the URL "http://www.example.com/index.html", the domain name is "www.example.com". You can use the "whois" command to lookup the suffix for a given domain name. For example, if you enter "whois example.com", the output will return ".com".


2 Answers

Rather than relying on unreliable regex use System.Uri to do the parsing for you. Use a code like this:

string uriStr = "www.foo.com";
if (!uriStr.Contains(Uri.SchemeDelimiter)) {
    uriStr = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriStr);
}
Uri uri = new Uri(uriStr);
string domain = uri.Host; // will return www.foo.com

Now to get just the top-level domain you can use:

string tld = uri.GetLeftPart( UriPartial.Authority ); // will return foo.com
like image 51
anubhava Avatar answered Sep 22 '22 15:09

anubhava


Here's a regular expression that will match the url's you have provided. Basically http and https etc are optional, as is the www Everything is then matched up to a possible path;

var expression = /(https?:\/\/)?(www\.)?([^\/]*)(\/.*)?$/;

This would mean that;

var result = 'https://www.foo.com.vu/blah'.replace(expression, '$3')

Would evaluate to

result === 'foo.com.vu'
like image 28
cmilhench Avatar answered Sep 26 '22 15:09

cmilhench