Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse Domain from a given URL in T-SQL

I fount this answer, but wanted to expand on the question and couldn't find any solutions here on stack or through searching google.

Substring domainname from URL SQL

Basically the link above solves my problem with a simple URL like parsing "www.google.com" with the result of google.

What I am looking for to expand on that is the solution from the link above doesn't help with url's like 'www.maps.google.com' that just returns maps.

WHat I would like is to have it return 'google' from the url 'www.maps.google.com' or return 'example' from 'www.test.example.com'.

If anyone has a solution to this, I would greatly appreciate it.

Update: To be more specific I will also need parsing on second level domains etc. 'www.maps.google.com.au' to return 'google'

Here is my Sql function.

CREATE FUNCTION [dbo].[parseURL]  (@strURL varchar(1000))
RETURNS varchar(1000)
AS
BEGIN

IF CHARINDEX('.', REPLACE(@strURL, 'www.','')) > 0
SELECT @strURL = LEFT(REPLACE(@strURL, 'www.',''), CHARINDEX('.',REPLACE(@strURL,              'www.',''))-1)
Else
SELECT @strURL = REPLACE(@strURL, 'www.','')

RETURN @strURL
END
like image 348
Adam N Avatar asked Dec 05 '12 22:12

Adam N


People also ask

How PARSE URL in MySQL?

You need to use SUBSTRING_INDEX() function from MySQL to extract part of a URL.

Can you PARSE in SQL?

The parsing stage involves separating the pieces of a SQL statement into a data structure that other routines can process. The database parses a statement when instructed by the application, which means that only the application, and not the database itself, can reduce the number of parses.

What is domain in SQL with example?

A domain is essentially a data type with optional constraints (restrictions on the allowed set of values). The user who defines a domain becomes its owner. If a schema name is given (for example, CREATE DOMAIN myschema. mydomain ... ) then the domain is created in the specified schema.


1 Answers

I'd suggest this

DECLARE @URL nvarchar(max) = 'www.maps.google.com'

DECLARE @X xml = CONVERT(xml,'<root><part>' + REPLACE(@URL, '.','</part><part>') + '</part></root>')

SELECT [Domain] = T.c.value('.','varchar(20)')
FROM @X.nodes('/root/part[position() = last() - 1]') T(c)

The approach is to convert the URL to XML and then use XPath to find the domain.

UPDATE

Regarding second-level domains, I believe the only reliable way it to have them all in a table (top-level domains should probably be in a table too) and then you could use this query:

DECLARE @URL nvarchar(max) = 'www.maps.google.com'

DECLARE @X xml = CONVERT(xml,'<root><part>' + REPLACE(REVERSE(@URL), '.','</part><part>') + '</part></root>')

;WITH SplitCTE AS
(
    SELECT
        (SELECT REVERSE(T.c.value('.', 'nvarchar(256)')) FROM @X.nodes('/root/part[. = ../part[position() = 1]]') T(c)) AS TLD,
        (SELECT REVERSE(T.c.value('.', 'nvarchar(256)')) FROM @X.nodes('/root/part[. = ../part[position() = 2]]') T(c)) AS D2,
        (SELECT REVERSE(T.c.value('.', 'nvarchar(256)')) FROM @X.nodes('/root/part[. = ../part[position() = 3]]') T(c)) AS D3
)
SELECT 
    CASE
        WHEN SLD.Domain IS NULL THEN S.D2 ELSE S.D3
    END AS Domain
FROM
    SplitCTE AS S
    LEFT JOIN TLD ON TLD.Domain = S.TLD
    LEFT JOIN SLD ON SLD.Domain = S.D2

The TLD/SLD tables I used for this example are below. The full list of domains is in this wiki. Be careful to use NVARCHAR as some are localized.

CREATE TABLE dbo.TLD
(
    Domain nvarchar(10)
)
GO

CREATE TABLE dbo.SLD
(
    Domain nvarchar(10)
)
GO

INSERT TLD VALUES ( 'com')
INSERT TLD VALUES ( 'uk')
INSERT SLD VALUES ( 'co')
like image 195
Serge Belov Avatar answered Oct 19 '22 18:10

Serge Belov