Out customers can enter websites from domain names. They also can enter mailadresses from their contacts.
Know we need to find customers which websited whoose domain can be associated to the domains of the mailadresses.
So my idea is to extract the host from the webadress and from the url and compare them
So what's the most reliable algorithm to get the hostname from a url?
for example a host can be:
foo.com
www.foo.com
http://foo.com
https://foo.com
https://www.foo.com
The result should always be foo.com
Splits a hostname into subdomains, domain and (effective) top-level domains. Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains.
First let's create a string with our URL (Note: If the URL isn't correctly structured you'll get an error). const url = 'https://www.michaelburrows.xyz/blog?search=hello&world'; Next we create a URL object using the new URL() constructor. let domain = (new URL(url));
Use ICANN Lookup Go to lookup.icann.org. In the search field, enter your domain name and click Lookup. In the results page, scroll down to Registrar Information. The registrar is usually your domain host.
Domain names are used in URLs to identify specific webpages. For example, in the URL "http://www.example.com/index.html", the domain name is "www.example.com". You can use the "whois" command to lookup the suffix for a given domain name. For example, if you enter "whois example.com", the output will return ".com".
Rather than relying on unreliable regex use System.Uri
to do the parsing for you. Use a code like this:
string uriStr = "www.foo.com";
if (!uriStr.Contains(Uri.SchemeDelimiter)) {
uriStr = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriStr);
}
Uri uri = new Uri(uriStr);
string domain = uri.Host; // will return www.foo.com
Now to get just the top-level domain you can use:
string tld = uri.GetLeftPart( UriPartial.Authority ); // will return foo.com
Here's a regular expression that will match the url's you have provided. Basically http and https etc are optional, as is the www Everything is then matched up to a possible path;
var expression = /(https?:\/\/)?(www\.)?([^\/]*)(\/.*)?$/;
This would mean that;
var result = 'https://www.foo.com.vu/blah'.replace(expression, '$3')
Would evaluate to
result === 'foo.com.vu'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With