I have an XML stored procedure in MS SQL 2005 which I use the SqlCommand.ExecuteXmlReader to get an XmlReader, then parse through the data and form an XML document. The problem is that the data in SQL contains some binary characters which are illegal within a UTF-8 XML document, so an exception is thrown.
Has anyone else dealt with this problem? I've considered filtering the data on input into the DB, but then I'd have to put the filtering everywhere, and every character would need to be checked.
Any other suggestions?
EDIT: The data is typically stored in varchar columns of various length. The data is actually input from users on web forms (ASP .NET app). So sometimes they copy-paste from MS Word or something and it puts these strange binary characters in.
I've have seen the DotNet SqlClient "scramble" data from nvarchar columns in the database, our theory that was its something to do with "surrogate code points", see:
http://www.siao2.com/2005/07/27/444101.aspx
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=rzaaxsurrogate.htm
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0004816.htm
SqlClient seemed to "interpret" some of the bytes meaing that our Xml was no longer well formed, converting to nvarchar(max) seemed to stop this (although this did have a performance impact):
SELECT CONVERT(NVARCHAR(MAX), MyValue) FROM ...
Note that you need to use NVARCHAR(MAX), NVARCHAR( N ) doesnt work.
We also found that the OleDB provider works correctly as well (although it is slower than the SqlClient).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With