Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to solve "unable to switch the encoding" error when inserting XML into SQL Server

I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining:

System.Data.SqlClient.SqlException (0x80131904):
XML parsing: line 1, character 39, unable to switch the encoding

I found out that the XML column has to be UTF-16 in order for the insert to succeed.

The code I'm using is:

 XmlSerializer serializer = new XmlSerializer(typeof(MyMessage));  StringWriter str = new StringWriter();  serializer.Serialize(str, message);  string messageToLog = str.ToString(); 

How can I serialize object to be in UTF-8 string?

EDIT: Ok, sorry for the mixup - the string needs to be in UTF-8. You were right - it's UTF-16 by default, and if I try to insert in UTF-8 it passes. So the question is how to serialize into UTF-8.

Example

This causes errors while trying to insert into SQL Server:

    <?xml version="1.0" encoding="utf-16"?>     <MyMessage>Teno</MyMessage> 

This doesn't:

    <?xml version="1.0" encoding="utf-8"?>     <MyMessage>Teno</MyMessage> 

Update

I figured out when the SQL Server 2008 for its Xml column type needs utf-8, and when utf-16 in encoding property of the xml specification you're trying to insert:

When you want to add utf-8, then add parameters to SQL command like this:

 sqlcmd.Parameters.Add("ParamName", SqlDbType.VarChar).Value = xmlValueToAdd; 

If you try to add the xmlValueToAdd with encoding=utf-16 in the previous row it would produce errors in insert. Also, the VarChar means that national characters aren't recognized (they turn out as question marks).

To add utf-16 to db, either use SqlDbType.NVarChar or SqlDbType.Xml in previous example, or just don't specify type at all:

 sqlcmd.Parameters.Add(new SqlParameter("ParamName", xmlValueToAdd)); 
like image 803
veljkoz Avatar asked Sep 21 '10 13:09

veljkoz


People also ask

What is XML encoding error?

XML Encoding error is given when validator finds something to be wrong with the encoding of the file. There are two aspects when in comes to an encoding of XML file: The encoding XML file is using. The encoding reported in the XML declaration within the file.

What is the meaning of XML version 1.0 encoding UTF-8?

version="1.0" means that this is the XML standard this file conforms to. encoding="utf-8" means that the file is encoded using the UTF-8 Unicode encoding.

What does UTF-8 mean in XML?

Unicode Transformation Format, 8-bit encoding form is designed for ease of use with existing ASCII-based systems and enables use of all the characters in the Unicode standard.

Does SQL Server support UTF-16?

SQL Server has long supported Unicode characters in the form of nchar, nvarchar, and ntext data types, which have been restricted to UTF-16.


1 Answers

This question is a near-duplicate of 2 others, and surprisingly - while this one is the most recent - I believe it is missing the best answer.

The duplicates, and what I believe to be their best answers, are:

  • Using StringWriter for XML Serialization (2009-10-14)
  • https://stackoverflow.com/a/1566154/751158
  • Trying to store XML content into SQL Server 2005 fails (encoding problem) (2008-12-21)
  • https://stackoverflow.com/a/1091209/751158

In the end, it doesn't matter what encoding is declared or used, as long as the XmlReader can parse it locally within the application server.

As was confirmed in Most efficient way to read XML in ADO.net from XML type column in SQL server?, SQL Server stores XML in an efficient binary format. By using the SqlXml class, ADO.net can communicate with SQL Server in this binary format, and not require the database server to do any serialization or de-serialization of XML. This should also be more efficient for transport across the network.

By using SqlXml, XML will be sent pre-parsed to the database, and then the DB doesn't need to know anything about character encodings - UTF-16 or otherwise. In particular, note that the XML declarations aren't even persisted with the data in the database, regardless of which method is used to insert it.

Please refer to the above-linked answers for methods that look very similar to this, but this example is mine:

using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using System.IO; using System.Xml;  static class XmlDemo {     static void Main(string[] args) {         using(SqlConnection conn = new SqlConnection()) {             conn.ConnectionString = "...";             conn.Open();              using(SqlCommand cmd = new SqlCommand("Insert Into TestData(Xml) Values (@Xml)", conn)) {                  cmd.Parameters.Add(new SqlParameter("@Xml", SqlDbType.Xml) {                     // Works.                     // Value = "<Test/>"                      // Works.  XML Declaration is not persisted!                     // Value = "<?xml version=\"1.0\"?><Test/>"                      // Works.  XML Declaration is not persisted!                     // Value = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><Test/>"                      // Error ("unable to switch the encoding" SqlException).                     // Value = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Test/>"                      // Works.  XML Declaration is not persisted!                     Value = new SqlXml(XmlReader.Create(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Test/>")))                 });                  cmd.ExecuteNonQuery();             }         }     } } 

Note that I would not consider the last (non-commented) example to be "production-ready", but left it as-is to be concise and readable. If done properly, both the StringReader and the created XmlReader should be initialized within using statements to ensure that their Close() methods are called when complete.

From what I've seen, the XML declarations are never persisted when using an XML column. Even without using .NET and just using this direct SQL insert statement, for example, the XML declaration is not saved into the database with the XML:

Insert Into TestData(Xml) Values ('<?xml version="1.0" encoding="UTF-8"?><Test/>'); 

Now in terms of the OP's question, the object to be serialized still needs to be converted into an XML structure from the MyMessage object, and XmlSerializer is still needed for this. However, at worst, instead of serializing to a String, the message could instead be serialized to an XmlDocument - which can then be passed to SqlXml through a new XmlNodeReader - avoiding a de-serialization/serialization trip to a string. (See http://blogs.msdn.com/b/jongallant/archive/2007/01/30/how-to-convert-xmldocument-to-xmlreader-for-sqlxml-data-type.aspx for details and an example.)

Everything here was developed against and tested with .NET 4.0 and SQL Server 2008 R2.

Please don't make waste by running XML through extra conversions (de-deserializations and serializations - to DOM, strings, or otherwise), as shown in other answers here and elsewhere.

like image 191
ziesemer Avatar answered Sep 29 '22 19:09

ziesemer