I have a data column in my table, and in this column there can be zero, one or multiple URLs along other text on each row. I would like to extract these URLs into a new dataset containing only these.
Why? Because I want to add some of these URLs to a block list in my DB to prevent spam.
For example, I have this text in the data column:
hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.
Then I want all URLs in the text:
httx://portugal-forex.com/
httx://phen375treatment.com/
httx://priligy2000.org/
And so on.
I really dont know where to start the do this in SQL.
nvarchar [ ( n | max ) ] n defines the string size in byte-pairs, and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^30-1 characters (2 GB). The storage size is two times n bytes + 2 bytes.
VARCHAR(MAX) or NVARCHAR(MAX) is considered as a 'large value type'. Large value types are usually stored 'out of row'. It means that the data row will have a pointer to another location where the 'large value' is stored...
The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar.
Overview of SQL Server NVARCHAR data typeIf you don't specify the string length, its default value is 1. In this syntax, max is the maximum storage size in bytes which is 2^31-1 bytes (2 GB). In general, the actual storage size in bytes of a NVARCHAR value is two times the number of characters entered plus 2 bytes.
Here is example. I search string from "httx://" to first "/" :
In any case you will need go one by one row.
Put code to Function
CREATE FUNCTION Temporary.getLinksFromText (@Tekstas NVARCHAR(MAX))
RETURNS @Data TABLE(TheLink NVARCHAR(500))
AS
BEGIN
DECLARE @FirstIndexOfChar INT,
@LastIndexOfChar INT,
@LengthOfStringBetweenChars INT ,
@String VARCHAR(MAX)
SET @FirstIndexOfChar = CHARINDEX('httx://',@Tekstas,0)
WHILE @FirstIndexOfChar > 0
BEGIN
SET @String = ''
SET @LastIndexOfChar = CHARINDEX('/',@Tekstas,@FirstIndexOfChar+7)
SET @LengthOfStringBetweenChars = @LastIndexOfChar - @FirstIndexOfChar + 1
SET @String = SUBSTRING(@Tekstas,@FirstIndexOfChar,@LengthOfStringBetweenChars)
INSERT INTO @Data (TheLink) VALUES (@String);
SET @Tekstas = SUBSTRING(@Tekstas, @LastIndexOfChar, LEN(@Tekstas))
SET @FirstIndexOfChar = CHARINDEX('httx://',@Tekstas, 0)
END
RETURN
END
Create some testing data:
CREATE TABLE #Data(weLink NVARCHAR(MAX));
INSERT INTO #Data VALUES
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.'),
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.')
And you can execute it like this (without cursor)
SELECT allLinks.*
FROM #Data AS D
OUTER APPLY Temporary.getLinksFromText (D.weLink) AS allLinks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With