Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I test XML equality in Ruby?

Tags:

xml

ruby

testing

Clearly I need to (a) convert both strings to canonical XML or (b) compare their parse-trees. The following doesn't work because the document object returned doesn't have a sensible == defined.

Nokogiri.XML(doc_a) == Nokogiri.XML(doc_b)

Nor does the following, because Nokogiri's to_xml leaves some internal whitespace:

Nokogiri.XML(doc_a).to_xml == Nokogiri.XML(doc_b).to_xml

This is a reasonable approximation of equality (and will work for most cases), but it's not quite right:

Nokogiri.XML(doc_a).to_xml.squeeze(' ') == Nokogiri.XML(doc_b).to_xml.squeeze(' ')

I'm already using Nokogiri, so I'd prefer to stick with it, but I'll use whatever library works.

like image 678
James A. Rosen Avatar asked Sep 15 '09 23:09

James A. Rosen


3 Answers

There are actually a couple good Nokogiri-based libraries for checking the equivalence of XML trees, including equivalent-xml or nokogiri-diff, that may be helpful.

I prefer equivalent-xml because it provides a little more flexibility (perhaps at the cost of strictness?), allowing you to compare with or without regard for element order or whitespace.

like image 79
cbeer Avatar answered Nov 19 '22 15:11

cbeer


If you are looking for structural equality and don't care about the order of tags and attributes, probably the xml-simple library is a good choice. It converts the xml into ruby's data structures (hashes and lists) which can be safely compared with the == operator.

like image 21
sgt Avatar answered Nov 19 '22 16:11

sgt


Converting them to strings won't be very successful. For example, if an element has two attributes, does the order really matter? In most cases, no. Does the order of children of a given node? Depends what you're doing. But if the answer to one of those questions is "no", then a simple string comparison is a kludge at best.

There isn't anything in Nokogiri to do it for you; you'll have to build it yourself. Aaron Patterson discusses some of the issues here:

As far as the XML document is concerned, no two nodes are ever equal. Every node in a document is different. Every node has many attributes to compare:

  1. Is the name the same?
  2. How about attributes?
  3. How about the namespace?
  4. What about number of children?
  5. Are all the children the same?
  6. Is it's parent node the same?
  7. What about it's position relative to sibling nodes?

Think about adding two nodes to the same document. They can never have the same position relative to sibling nodes, therefore two nodes in a document cannot be "equal".

You can however compare two different documents. But you need to answer those 7 questions yourself as you're walking the two trees. Your requirements for sameness may differ from others.

That's your best bet: walk the trees and make those comparisons.

like image 1
Pesto Avatar answered Nov 19 '22 14:11

Pesto