I am attempting to process a large XML document (using a XmlReader
) in a single pass, and deserialize only certain elements in it using an XmlSerializer
.
Below is some code and a tiny mock XML document showing how I have attempted to do this.
Rationale for using
XmlReader
: 1. I am dealing with very large XML documents (10–250 MB), which for this reason I do not want to load into memory. SoXmlDocument
is out of the question. 2. I want to extract only certain elements. Typically I will be able to ignore most other content.XmlReader
appears to give me an efficient means of skipping irrelevant content. 3. I do not know in advance whether any and all elements that I can deal with will be present; therefore I am not using a bunch ofXpath
/XQuery
or LINQ to XML-based queries, because I want to make only a single pass over the XML files (due to their size).
public class ElementOfInterest { }
…
var xml = @"<?xml version='1.0' encoding='utf-8' ?>
<Root xmlns:ex='urn:stakx:example'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<ElementOfInterest xsi:type='ex:ElementOfInterest' />
</Root>";
var reader = System.Xml.XmlReader.Create(new System.IO.StringReader(xml));
reader.ReadToFollowing("ElementOfInterest");
var serializer = new System.Xml.Serialization.XmlSerializer(typeof(ElementOfInterest));
serializer.Deserialize(reader.ReadSubtree());
The last line of code throws the following inner exception:
InvalidOperationException
: "Namespace prefixex
is not defined."
Obviously, the XmlSerializer
doesn't recognise the ex
namespace prefix inside the xsi:type
attribute's value.
This is just one error I am having, but frankly, the larger problem is that I have no idea how to go about the whole namespace issue. I am simply looking for a convenient way to de-serialize just a single node out of the XML document, but that seems to entail having to manually register/manage namespaces, and to somehow forward them from the XmlReader
to the XmlSerializer
.
Can someone demonstrate how to deserialize a single node from a XML document read with an XmlReader
, either by pointing out the error in my code, or by showing an alternative approach?
As with the CreatePo method, you must first construct an XmlSerializer, passing the type of the class to be deserialized to the constructor. Also, a FileStream is required to read the XML document. To deserialize the objects, call the Deserialize method with the FileStream as an argument.
Yes, you can tell the XmlSerializer to ignore namespaces during de-serialization.
The following works:
using System.IO;
using System.Xml;
using System.Xml.Serialization;
static void Main()
{
var xml = @"<?xml version='1.0' encoding='utf-8' ?>
<Root
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xmlns:ex='urn:stakx:example'
>
<ex:ElementOfInterest xsi:type='ex:ElementOfInterest' />
</Root>";
var nt = new NameTable();
var mgr = new XmlNamespaceManager(nt);
mgr.AddNamespace("ex", "urn:stakx:example");
var ctxt = new XmlParserContext(nt, mgr, "", XmlSpace.Default);
var reader = XmlReader.Create(new StringReader(xml), null, ctxt);
var serializer = new XmlSerializer(typeof(ElementOfInterest));
reader.ReadToFollowing("ElementOfInterest", "urn:stakx:example");
var eoi = (ElementOfInterest)serializer.Deserialize(reader.ReadSubtree());
}
[XmlRoot(Namespace = "urn:stakx:example")]
public class ElementOfInterest { }
Note the namespace in the input: <ex:ElementOfInterest>
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With