Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter Illegal XML Characters in .NET

Tags:

.net

sql

xml

I have an XML stored procedure in MS SQL 2005 which I use the SqlCommand.ExecuteXmlReader to get an XmlReader, then parse through the data and form an XML document. The problem is that the data in SQL contains some binary characters which are illegal within a UTF-8 XML document, so an exception is thrown.

Has anyone else dealt with this problem? I've considered filtering the data on input into the DB, but then I'd have to put the filtering everywhere, and every character would need to be checked.

Any other suggestions?

EDIT: The data is typically stored in varchar columns of various length. The data is actually input from users on web forms (ASP .NET app). So sometimes they copy-paste from MS Word or something and it puts these strange binary characters in.

like image 979
Brandon Montgomery Avatar asked Apr 29 '09 12:04

Brandon Montgomery


1 Answers

I've have seen the DotNet SqlClient "scramble" data from nvarchar columns in the database, our theory that was its something to do with "surrogate code points", see:

http://www.siao2.com/2005/07/27/444101.aspx

http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=rzaaxsurrogate.htm

http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0004816.htm

SqlClient seemed to "interpret" some of the bytes meaing that our Xml was no longer well formed, converting to nvarchar(max) seemed to stop this (although this did have a performance impact):

SELECT CONVERT(NVARCHAR(MAX), MyValue) FROM ...

Note that you need to use NVARCHAR(MAX), NVARCHAR( N ) doesnt work.

We also found that the OleDB provider works correctly as well (although it is slower than the SqlClient).

like image 79
Justin Avatar answered Oct 20 '22 10:10

Justin