I'm trying to create an inline table function to strip out HTML (so that I can CROSS APPLY it with another table). We have a scalar function in place, but the performance is just not good enough.
I'm really new to using T-SQL, so I don't know that many good practices of it, so if anyone has any suggestions, please let me know!
This is what I have so far:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION StripHTML
(
@text VARCHAR(MAX)
)
RETURNS TABLE
AS
RETURN
WITH BracketIndexes (startIndex, stopIndex) AS (SELECT
CHARINDEX('<', @text),
CHARINDEX('>', @text) UNION ALL SELECT
CHARINDEX('<', @text, stopIndex + 1),
CHARINDEX('>', @text, stopIndex + 1)
FROM BracketIndexes
WHERE startIndex > 0 AND stopIndex > 0
)
SELECT * FROM BracketIndexes
Now that I have the indices of where the <s and > are, I'm not sure how to:
I'm thinking I can just concat the left side and right side of each bracket using LEFT/RIGHT, but I can't seem to get the indices correct.
Edit:
This kinda worked:
DECLARE @text VARCHAR(100) = 'HELLO <B>WORLD</B> asd';
WITH BracketIndexes (startIndex, stopIndex) AS (
SELECT
CHARINDEX('<', @text),
CHARINDEX('>', @text)
UNION ALL
SELECT
CHARINDEX('<', @text, stopIndex + 1) - (stopIndex - startIndex + 1),
CHARINDEX('>', @text, stopIndex + 1) - (stopIndex - startIndex + 1)
FROM BracketIndexes
WHERE startIndex > 0 AND stopIndex > 0
)
SELECT @text = LEFT(@text, startIndex - 1) + RIGHT(@text, LEN(@text) - stopIndex)FROM BracketIndexes WHERE startIndex > 0 AND stopIndex > 0
SELECT @text
But when when I added some more HTML statements into @text, the RIGHT had an incorrect length parameter.
You were on the right path. With each recursion of the CTE, you want to basically replace one HTML element with a blank using a combination of REPLACE() and SUBSTRING():
DECLARE @text varchar(max) = '<a href="">Cool!</a><b>oh yeah!</b>'
;WITH BracketIndexes (rowNumber, original, startIndex, stopIndex, replaced) AS (SELECT
1,
@text,
CHARINDEX('<', @text),
CHARINDEX('>', @text),
REPLACE(@text,SUBSTRING(@text,CHARINDEX('<', @text),CHARINDEX('>', @text)),'') UNION ALL SELECT
rowNumber+1,
replaced,
CHARINDEX('<', replaced),
CHARINDEX('>', replaced),
REPLACE(replaced,SUBSTRING(replaced,CHARINDEX('<', replaced),CHARINDEX('>', replaced)),'')
FROM BracketIndexes
WHERE startIndex > 0 AND stopIndex > 0
)
SELECT TOP 1
replaced
FROM
BracketIndexes
ORDER BY RowNumber DESC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With