Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing XmlDocument for equality (content wise)

If I want to compare the contents of a XMlDocument, is it just like this?

XmlDocument doc1 = GetDoc1();
XmlDocument doc2 = GetDoc2();

if(doc1 == doc2)
{

}

I am not checking if they are both the same object reference, but if the CONTENTS of the xml are the same.

like image 735
Blankman Avatar asked May 27 '10 19:05

Blankman


People also ask

What is the difference between XDocument and XmlDocument?

XDocument is from the LINQ to XML API, and XmlDocument is the standard DOM-style API for XML. If you know DOM well, and don't want to learn LINQ to XML, go with XmlDocument . If you're new to both, check out this page that compares the two, and pick which one you like the looks of better.


2 Answers

Try the DeepEquals method on the XLinq API.

XDocument doc1 = GetDoc1(); 
XDocument doc2 = GetDoc2(); 
 
if(XNode.DeepEquals(doc1, doc2)) 
{ 
 
} 

See also Equality Semantics of LINQ to XML Trees

like image 108
Max Toro Avatar answered Oct 09 '22 03:10

Max Toro


No. XmlDocument does not override the behavior of the Equals() method so, it is in fact just performing reference equality - which will fail in your example, unless the documents are actually the same object instance.

If you want to compare the contents (attributes, elements, commments, PIs, etc) of a document you will have to implement that logic yourself. Be warned: it's not trivial.

Depending on your exact scenario, you may be able to remove all non-essential whitespace from the document (which itself can be tricky) and them compare the resulting xml text. This is not perfect - it fails for documents that are semantically identical, but differ in things like how namespaces are used and declared, or whether certain values are escaped or not, the order of elements, and so on. As I said before, XML comparison is not trivial.

You also need to clearly define what it means for two XML documents to be "identical". Does element or attribute ordering matter? Does case (in text nodes) matter? Should you ignore superfluous CDATA sections? Do processing instructions count? What about fully qualified vs. partially qualified namespaces?

In any general purpose implementation, you're likely going to want to transform both documents into some canonical form (be it XML or some other representation) and then compare the canonicalized content.

Tools already exist that perform XML differencing, like Microsoft XML Diff/Patch, you may be able to leverage that to identify differences between two documents. To my knowledge that tool is not distributed in source form ... so to use it in an embedded application you would need to script the process (if you plan to use it, you should first verify that the licensing terms allow it's use and redistribution).

EDIT: Check out @Max Toro's answer if you're using .NET 3.5 SP1, as apparently there's an option in XLinq that may be helpful. Nice to know it exists.

like image 42
LBushkin Avatar answered Oct 09 '22 03:10

LBushkin