So I got a varbinary(max)
column in SQL Server 2005 and it's full of XML. A few records somewhere have truncated XML so they're invalid.
This means if I run a
SELECT CAST(myVarbinaryColumn as XML) ...
it blows chunks.
How can I filter out/skip invalid xml ?
When I've done similar with a varchar that supposedly has dates I could use ISDATE(blah) = 1
. So an equivalent ISVALIDXML()
would be nice.
Please don't comment about "Why isn't the column XML datatype anyway.." This happened in the past and I don't have a time machine.
varbinary [ ( n | max) ] Variable-length binary data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of the data entered + 2 bytes.
INSERT INTO #TempTable(PK, VarBinaryColumn) SELECT PK, VarBinaryColumn FROM dbo. YourPermanentTable; If you need to convert the varbinary data back to the original file text format in T-SQL, you can use CAST or CONVERT to convert to varchar or nvarchar as long as the data was originally ASCII or Unicode.
I think your best bet would be to write a custom CLR function, perhaps using XmlDocument.Load. In the CLR you could trap the error on a failed load and return an appropriate result.
EDIT: The code below would also work although it's not as elegant as a UDF. Unfortunately, we can't use TRY/CATCH in a UDF.
create procedure dbo.usp_IsValidXML(@XMLCandidate varbinary(max), @Return bit output)
as
begin
declare @x xml
begin try
set @x = cast(@XMLCandidate as xml)
set @Return = 1
end try
begin catch
set @Return = 0
end catch
end
go
declare @test1 varbinary(max)
declare @test2 varbinary(max)
set @test1 = cast('<data>asdf</data>' as varbinary(max))
set @test2 = cast('<data>asdf</da' as varbinary(max))
declare @IsValid bit
exec dbo.usp_IsValidXML @test1, @IsValid output
select @IsValid
exec dbo.usp_IsValidXML @test2, @IsValid output
select @IsValid
drop procedure dbo.usp_IsValidXML
I wish I had that IsValidXML()
function, too..... unfortunately, I don't think there's anything like that.....
Just a thought: is there anything you could check for to filter out the invalid XML??
E.g. could you filter out all those strings that don't end in .....</data>
or something like that?? (seeing that you say your invalid XML is typically truncated XML, so I would think the closing tag - </data>
or whatever - would be missing in those cases).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With