Clearly I need to (a) convert both strings to canonical XML or (b) compare their parse-trees. The following doesn't work because the document object returned doesn't have a sensible ==
defined.
Nokogiri.XML(doc_a) == Nokogiri.XML(doc_b)
Nor does the following, because Nokogiri's to_xml
leaves some internal whitespace:
Nokogiri.XML(doc_a).to_xml == Nokogiri.XML(doc_b).to_xml
This is a reasonable approximation of equality (and will work for most cases), but it's not quite right:
Nokogiri.XML(doc_a).to_xml.squeeze(' ') == Nokogiri.XML(doc_b).to_xml.squeeze(' ')
I'm already using Nokogiri, so I'd prefer to stick with it, but I'll use whatever library works.
There are actually a couple good Nokogiri-based libraries for checking the equivalence of XML trees, including equivalent-xml or nokogiri-diff, that may be helpful.
I prefer equivalent-xml because it provides a little more flexibility (perhaps at the cost of strictness?), allowing you to compare with or without regard for element order or whitespace.
If you are looking for structural equality and don't care about the order of tags and attributes, probably the xml-simple library is a good choice. It converts the xml into ruby's data structures (hashes and lists) which can be safely compared with the ==
operator.
Converting them to strings won't be very successful. For example, if an element has two attributes, does the order really matter? In most cases, no. Does the order of children of a given node? Depends what you're doing. But if the answer to one of those questions is "no", then a simple string comparison is a kludge at best.
There isn't anything in Nokogiri to do it for you; you'll have to build it yourself. Aaron Patterson discusses some of the issues here:
As far as the XML document is concerned, no two nodes are ever equal. Every node in a document is different. Every node has many attributes to compare:
- Is the name the same?
- How about attributes?
- How about the namespace?
- What about number of children?
- Are all the children the same?
- Is it's parent node the same?
- What about it's position relative to sibling nodes?
Think about adding two nodes to the same document. They can never have the same position relative to sibling nodes, therefore two nodes in a document cannot be "equal".
You can however compare two different documents. But you need to answer those 7 questions yourself as you're walking the two trees. Your requirements for sameness may differ from others.
That's your best bet: walk the trees and make those comparisons.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With