Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open source command line tool for Linux to diff XML files ignoring element order

Is there an open source command-line tool (for Linux) to diff XML files which ignores the element order?

Example input file a.xml:

<tag name="AAA">
  <attr name="b" value="1"/>
  <attr name="c" value="2"/>
  <attr name="a" value="3"/>
</tag>

<tag name="BBB">
  <attr name="x" value="111"/>
  <attr name="z" value="222"/>
</tag>
<tag name="BBB">
  <attr name="x" value="333"/>
  <attr name="z" value="444"/>
</tag>

b.xml:

<tag name="AAA">
  <attr name="a" value="3"/>
  <attr name="b" value="1"/>
  <attr name="c" value="2"/>
</tag>

<tag name="BBB">
  <attr name="z" value="444"/>
  <attr name="x" value="333"/>
</tag>
<tag name="BBB">
  <attr name="x" value="111"/>
  <attr name="z" value="222"/>
</tag>

So comparing these 2 files should not output any differences. I have tried to sort the files with XSLT first:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="WINDOWS-1252" omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*">
        <xsl:sort select="@*" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

But the problem is that for the elements <tag name="BBB"> there is no sorting. They are simply output the order which they are input.

I have already looked at diffXml, xDiff, XMLUnit, xmlstarlet but none of these solve the problem; the diff output should be human readable, e.g. like when using diff.

Any hints on how either the sorting or ignoring element-order diff can be solved? Thanks!

like image 448
user1613270 Avatar asked Aug 21 '12 12:08

user1613270


People also ask

What is Xmllint?

The xmllint program parses one or more XML files, specified on the command line as XML-FILE (or the standard input if the filename provided is - ). It prints various types of output, depending upon the options selected. It is useful for detecting errors both in XML code and in the XML parser itself.


1 Answers

I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files

That post suggests doing a canonical xml sort then doing a diff. Being that you are on linux, this should work for you cleanly. It worked for me on my mac, and should work for people on windows if they have something like cygwin installed:

$ xmllint --c14n a.xml > sortedA.xml
$ xmllint --c14n b.xml > sortedB.xml
$ diff sortedA.xml sortedB.xml
like image 62
James Oravec Avatar answered Nov 15 '22 20:11

James Oravec