Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Serialization of an Object Containing invalid chars

I'm serializing an object that contains HTML data in a String Property.

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Create)
Formatter.Serialize(fs, Ob)
fs.Close()

But when I'm reading the XML back to the Object:

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Open)
Dim Ob = CType(Formatter.Deserialize(fs), MyObject)
fs.Close()

I get this error:

"'', hexadecimal value 0x14, is an invalid character. Line 395, position 22."

Shouldn't .NET prevent this kind of error, escaping the invalid characters?

What's happening here and how can I fix it?

like image 491
InfoStatus Avatar asked Jul 22 '09 15:07

InfoStatus


3 Answers

I set the XmlReaderSettings property CheckCharacters to false. I would only advise doing this if you have serialized the data yourself via XmlSerializer. If it's from an unknown source then it's not really a good idea.

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings() { CheckCharacters = false };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T));

    return (T)xs.Deserialize(xmlReader);
}
like image 154
Brandon Kuehl Avatar answered Oct 20 '22 21:10

Brandon Kuehl


It should really have failed in the serialize step, because 0x14 is an invalid value for XML. There is no way to escape it, not even with &#x14, since it is excluded as a valid character from the XML model. I am actually surprised that the serializer lets this through, as it makes the serializer a non-conforming one.

Is it possible for you to remove the invalid characters from the string before serializing it? For what purpose do you have an 0x14 in HTML?

Or, is it possible you are writing with one encoding, and reading with a different one?

like image 35
lavinio Avatar answered Oct 20 '22 22:10

lavinio


You should really post the code of the class you're trying to serialize and deserialize. In the meantime, I'll make a guess.

Most likely, the invalid character is in a field or property of type string. You will need to serialize that as an array of bytes, assuming you can't avoid having that character present at all:

[XmlRoot("root")]
public class HasBase64Content
{
    internal HasBase64Content()
    {
    }

    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}

This produces XML like the following:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Base64Content>AAECAwQFFA==</Base64Content>
</root>

I see you'd probably prefer VB.NET:

''# Prettify doesn't like attributes as the first item in a VB code block, so this comment is here so that it looks right on StackOverflow.
<XmlRoot("root")> _
Public Class HasBase64Content

    Private _content As String
    <XmlIgnore()> _
    Public Property Content() As String
        Get
            Return _content
        End Get
        Set(ByVal value As String)
            _content = value
        End Set
    End Property

    <XmlElement()> _
    Public Property Base64Content() As Byte()
        Get
            Return System.Text.Encoding.UTF8.GetBytes(Content)
        End Get
        Set(ByVal value As Byte())
            If Value Is Nothing Then
                Content = Nothing
                Return
            End If
            Content = System.Text.Encoding.UTF8.GetString(Value)
        End Set
    End Property
End Class
like image 1
John Saunders Avatar answered Oct 20 '22 21:10

John Saunders