Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Treating strings for insertion into XElement

We gather lots of strings and send them to our clients in xml fragments. These strings could contain literally any character. We've been seeing an error caused by trying to serialize XElement instances that contain "bad" characters. Here's an example:

var message = new XElement("song");
char c = (char)0x1a; //sub
var someData = string.Format("some{0}stuff", c);
var attr = new XAttribute("someAttr", someData);
message.Add(attr);
string msgStr = message.ToString(SaveOptions.DisableFormatting); //exception here

The code above generates an exception at the indicated line. Here's the stacktrace:

'SUB', hexadecimal value 0x1A, is an invalid character. System.ArgumentException System.ArgumentException: '', hexadecimal value 0x1A, is an invalid character.
   at System.Xml.XmlEncodedRawTextWriter.InvalidXmlChar(Int32 ch, Char* pDst, Boolean entitize)
   at System.Xml.XmlEncodedRawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlEncodedRawTextWriter.WriteString(String text)
   at System.Xml.XmlWellFormedWriter.WriteString(String text)
   at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value)
   at System.Xml.Linq.ElementWriter.WriteStartElement(XElement e)
   at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
   at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
   at System.Xml.Linq.XNode.GetXmlString(SaveOptions o)

My suspicion is that this is not the correct behaviour and the bad char should be escaped into the XML. Whether this is desirable or not is a question I will answer later.

So here's the question:

Is there some way of treating strings such that this error might not occur, or should I simply strip all chars below char 0x20 and cross my fingers?

like image 489
spender Avatar asked Oct 17 '12 21:10

spender


1 Answers

A little digging with ILSpy revealed that one can use the XmlWriter/ReaderSettings.CheckCharacters field to control whether or not an exception is thrown for invalid characters. Borrowing from the XNode.ToString method and the XDocument.Parse method, I've come up with the following examples:

To stringify an XLinq object with invalid (control) characters:

XDocument xdoc = XDocument.Parse("<root>foo</root>");
using (StringWriter stringWriter = new StringWriter())
{
    XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { OmitXmlDeclaration = true, CheckCharacters = false };
    using (XmlWriter xmlWriter = XmlWriter.Create(stringWriter, xmlWriterSettings))
    {
        xdoc.WriteTo(xmlWriter);
    }

    return stringWriter.ToString();
}

To parse an XLinq object with invalid characters:

XDocument xdoc;
using (StringReader stringReader = new StringReader(text))
{
    XmlReaderSettings xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false, DtdProcessing = DtdProcessing.Parse, MaxCharactersFromEntities = 10000000L, XmlResolver = null };
    using (XmlReader xmlReader = XmlReader.Create(stringReader, xmlReaderSettings))
    {
        xdoc = XDocument.Load(xmlReader);
    }
}
like image 79
Aeon Avatar answered Sep 19 '22 11:09

Aeon