Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct XML serialization and deserialization of "mixed" types in .NET

My current task involves writing a class library for processing HL7 CDA files.
These HL7 CDA files are XML files with a defined XML schema, so I used xsd.exe to generate .NET classes for XML serialization and deserialization.

The XML Schema contains various types which contain the mixed="true" attribute, specifying that an XML node of this type may contain normal text mixed with other XML nodes.
The relevant part of the XML schema for one of these types looks like this:

<xs:complexType name="StrucDoc.Paragraph" mixed="true">
    <xs:sequence>
        <xs:element name="caption" type="StrucDoc.Caption" minOccurs="0"/>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="br" type="StrucDoc.Br"/>
            <xs:element name="sub" type="StrucDoc.Sub"/>
            <xs:element name="sup" type="StrucDoc.Sup"/>
            <!-- ...other possible nodes... -->
        </xs:choice>
    </xs:sequence>
    <xs:attribute name="ID" type="xs:ID"/>
    <!-- ...other attributes... -->
</xs:complexType>

The generated code for this type looks like this:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string[] textField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlTextAttribute()]
    public string[] Text {
        get {
            return this.textField;
        }
        set {
            this.textField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

If I deserialize an XML element where the paragraph node looks like this:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item and text arrays are read like this:

itemsField = new object[]
{
    new StrucDocBr(),
    new StrucDocBr(),
};
textField = new string[]
{
    "first line",
    "third line",
};

From this there is no possible way to determine the exact order of the text and the other nodes.
If I serialize this again, the result looks exactly like this:

<paragraph>
    <br />
    <br />first linethird line
</paragraph>

The default serializer just serializes the items first and then the text.

I tried implementing IXmlSerializable on the StrucDocParagraph class so that I could control the deserialization and serialization of the content, but it's rather complex since there are so many classes involved and I didn't come to a solution yet because I don't know if the effort pays off.

Is there some kind of easy workaround to this problem, or is it even possible by doing custom serialization via IXmlSerializable? Or should I just use XmlDocument or XmlReader/XmlWriter to process these documents?

like image 976
Stefan Podskubka Avatar asked Apr 02 '10 15:04

Stefan Podskubka


People also ask

What is the correct way of using XML serialization?

As with the CreatePo method, you must first construct an XmlSerializer, passing the type of class to be deserialized to the constructor. Also, a FileStream is required to read the XML document. To deserialize the objects, call the Deserialize method with the FileStream as an argument.

What is XML serialization and Deserialization in C#?

Serialization is a process by which an object's state is transformed in some serial data format, such as XML or binary format. Deserialization, on the other hand, is used to convert the byte of data, such as XML or binary data, to object type.

What is XML serialization in C#?

XML serialization is the process of converting an object's public properties and fields to a serial format (in this case, XML) for storage or transport. Deserialization re-creates the object in its original state from the XML output.

Is XML a serialization format?

XML serialization is the process of converting XML data from its representation in the XQuery and XPath data model, which is the hierarchical format it has in a Db2® database, to the serialized string format that it has in an application.


2 Answers

To solve this problem I had to modify the generated classes:

  1. Move the XmlTextAttribute from the Text property to the Items property and add the parameter Type = typeof(string)
  2. Remove the Text property
  3. Remove the textField field

As a result the generated code (modified) looks like this:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    [System.Xml.Serialization.XmlTextAttribute(typeof(string))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

Now if I deserialize an XML element where the paragraph node looks like this:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item array is read like this:

itemsField = new object[]
{
    "first line",
    new StrucDocBr(),
    new StrucDocBr(),
    "third line",
};

This is exactly what I need, the order of the items and their content is correct.
And if I serialize this again, the result is again correct:

<paragraph>first line<br /><br />third line</paragraph>

What pointed me in the right direction was the answer by Guillaume, I also thought that it must be possible like this. And then there was this in the MSDN documentation to XmlTextAttribute:

You can apply the XmlTextAttribute to a field or property that returns an array of strings. You can also apply the attribute to an array of type Object but you must set the Type property to string. In that case, any strings inserted into the array are serialized as XML text.

So the serialization and deserialization work correct now, but I don't know if there are any other side effects. Maybe it's not possible to generate a schema from these classes with xsd.exe anymore, but I don't need that anyway.

like image 116
Stefan Podskubka Avatar answered Oct 25 '22 23:10

Stefan Podskubka


I had the same problem as this, and came across this solution of altering the .cs generated by xsd.exe. Although it did work, I wasn't comfortable with altering the generated code, as I would need to remember to do it any time I regenerated the classes. It also led to some awkward code which had to test for and cast to XmlNode[] for the mailto elements.

My solution was to rethink the xsd. I ditched the use of the mixed type, and essentially defined my own mixed type.

I had this

XML: <text>some text <mailto>[email protected]</mailto>some more text</text>

<xs:complexType name="text" mixed="true">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="mailto" type="xs:string" />
    </xs:sequence>
  </xs:complexType>

and changed to

XML: <mytext><text>some text </text><mailto>[email protected]</mailto><text>some more text</text></mytext>

<xs:complexType name="mytext">
    <xs:sequence>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="text">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string" />
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
        <xs:element name="mailto">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string" />
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>

My generated code now gives me a class myText:

public partial class myText{

    private object[] itemsField;

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("mailto", typeof(myTextTextMailto))]
    [System.Xml.Serialization.XmlElementAttribute("text", typeof(myTextText))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }
}

the order of the elements is now preserved in the serilization/deserialisation, but i do have to test for/ cast to/program against the types myTextTextMailto and myTextText.

Just thought I'd throw that in as an alternative approach which worked for me.

like image 41
Feenster Avatar answered Oct 25 '22 21:10

Feenster