Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does SQL Server add a byte order mark when casting to XML?

I have this C# method that is meant to ignore the byte order mark when serializing to XML:

public static string SerializeAsXml(this object dataToSerialize)
{
   if (dataToSerialize == null) return null;

   using (var stringwriter = new StringWriter())
   {
      var serializer = new XmlSerializer(dataToSerialize.GetType());

      serializer.Serialize(stringwriter, dataToSerialize);

      var xml = stringwriter.ToString();

      var utf8 = new UTF8Encoding(false);

      var bytes = utf8.GetBytes(xml);

      xml = utf8.GetString(bytes);

      return xml;
   }
}

The result is being passed to a stored procedure and cast to XML like this: @EventMessage AS XML

This stored procedure adds this as a message on a service broker queue.

But, when testing, the BOM is still present when retrieved from the queue.

Does SQL Server add a BOM itself when casting? And it so, is there a way to prevent this?

EDIT:

I retrieve the value from the queue with this query in a fitnesse test:

var sqlSelectCommand =
            $@"SELECT message_type_name, message_body, casted_message_body = 
            CASE message_type_name WHEN 'X' 
              THEN CAST(message_body AS NVARCHAR(MAX)) 
              ELSE message_body 
            END 
            FROM {QueueName} WITH (NOLOCK)";

This is read with this:

var castedMessageBody = reader["casted_message_body"].ToString();

And I know the BOM is still present because the test needs this to pass:

   if (castedMessageBody.StartsWith(_byteOrderMarkUtf8, StringComparison.Ordinal))
   {
       castedMessageBody = castedMessageBody.Remove(0, _byteOrderMarkUtf8.Length);
   }
like image 299
Kieran Avatar asked Mar 12 '26 04:03

Kieran


1 Answers

Technically I don't think it does add a BOM when casting as XML since:

The data is stored in an internal representation that preserves the XML content of the data. This internal representation includes information about the containment hierarchy, document order, and element and attribute values. Specifically, the InfoSet content of the XML data is preserved

Since the BOM is an artefact of string encodings of XML and not part of the XML Infoset, I don't think a BOM is stored.

However, if you cast the XML data into a binary or string representation in SQL Server, it appears to prefer a UTF-16 encoding with BOM as the representation you receive.

like image 77
Damien_The_Unbeliever Avatar answered Mar 14 '26 18:03

Damien_The_Unbeliever