Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract hostname from a URL

Tags:

sql

sql-server

I have to trim the website name upto ".com" or"co.in" in sql

example: lets assume i have site address as"http://stackoverflow.com/questions/ask?title=trim". I need to get the result as "stackoverflow.com".

some scenario it may be like "www.google.co.in" then i need it to be "google.co.in"

like image 486
user1117040 Avatar asked Nov 07 '12 09:11

user1117040


People also ask

How parse URL in HTML?

Method 1: In this method, we will use createElement() method to create a HTML element, anchor tag and then use it for parsing the given URL. Method 2: In this method we will use URL() to create a new URL object and then use it for parsing the provided URL.

How do I find the hostname of a URL?

The getHost() method of URL class returns the hostname of the URL. This method will return the IPv6 address enclosed in square brackets ('['and']').

What is the hostname part of a URL?

hostname. The hostname property of the URL interface is a string containing the domain name of the URL.


2 Answers

I know this is an old thread, but I was trying to do this recently and the answers here ether did not cover strings starting with http/s or the new gTLDs. So here is what I came up with using CTE expressions to try and keep it as readable and understandable as possible.

Hopefully it will help anyone stumbling upon this thread in the future!

DECLARE @Var NVARCHAR(1000)
SET @Var='http://stackoverflow.com/questions/ask?title=trim';

WITH cteWithoutWWW (Domain)
as
(
    SELECT
      case when PATINDEX('%www.%', @Var) > 0 then
            SUBSTRING(@Var, PATINDEX('%www.%', @Var) + 4, LEN(@Var) - PATINDEX('%www.%', @Var))
      else
            @Var
      end
),
cteWithoutHTTP (Domain)
as
(
      select
      case when PATINDEX('http://%', Domain) > 0 then
            SUBSTRING(Domain, PATINDEX('http://%', Domain) + 7, LEN(Domain) - PATINDEX('http://%', Domain))
      else
            Domain
      end
      from cteWithoutWWW
),
cteWithoutSlash (Domain)
as
(
      select
      case when CHARINDEX('/', Domain) > 0 then
            SUBSTRING(Domain, 0, CHARINDEX('/', Domain))
      else
            Domain
      end
      from cteWithoutHTTP
)
select Domain from cteWithoutSlash 
like image 34
discorevilo Avatar answered Sep 19 '22 06:09

discorevilo


I found that there can be a lot of variation here, especially when running against a table of referrers. For this reason I created a SQL script that gets the host name from a web address that also covers all of the edge cases I found.

DECLARE @WebAddress varchar(300) = 'https://www.stevefenton.co.uk/2015/09/select-the-host-name-from-a-string-in-sql/'
SELECT 
    /* Get just the host name from a URL */
    SUBSTRING(@WebAddress,
        /* Starting Position (After any '//') */
        (CASE WHEN CHARINDEX('//', @WebAddress)= 0 THEN 1 ELSE CHARINDEX('//', @WebAddress) + 2 END),
        /* Length (ending on first '/' or on a '?') */
        CASE
            WHEN CHARINDEX('/', @WebAddress, CHARINDEX('//', @WebAddress) + 2) > 0 THEN CHARINDEX('/', @WebAddress, CHARINDEX('//', @WebAddress) + 2) - (CASE WHEN CHARINDEX('//', @WebAddress)= 0 THEN 1 ELSE CHARINDEX('//', @WebAddress) + 2 END)
            WHEN CHARINDEX('?', @WebAddress, CHARINDEX('//', @WebAddress) + 2) > 0 THEN CHARINDEX('?', @WebAddress, CHARINDEX('//', @WebAddress) + 2) - (CASE WHEN CHARINDEX('//', @WebAddress)= 0 THEN 1 ELSE CHARINDEX('//', @WebAddress) + 2 END)
            ELSE LEN(@WebAddress)
        END
    ) AS 'HostName'

This will handle...

  • An address starting www. (i.e. no scheme)
  • A address starting //
  • Host names that terminate with a /
  • Host names that terminate with a query string
like image 104
Fenton Avatar answered Sep 20 '22 06:09

Fenton