Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Command-line XML validator for Windows

I've always found validation against a schema to be an invaluable ward against thinkos and would like to incorporate validation checks as part of a project where I frequently need to hand-write XML files a few hundred lines in length. My text editor has a fairly nice CLI integration feature, so I'm looking for a command-line validator.

When I didn't find any clear winners via Google, I poked around here and found a similar question, but none of the tools suggested there quite fit my needs:

  • libxml (via cygwin) — does not report line numbers; I have no idea where my errors are!
  • msxml — cannot be run from the command line?
  • xerces-c — seems to require a copy of Visual C?
  • xerces2-j — cannot be run from the command line?
  • xmlstarlet — insufficient XSD support*

(*The schema I'm validating against uses substitution groups — inappropriately, but it's external to the project, so I can't change it — which causes xmlstarlet to choke even on valid files.)

Normally, this is the point in solving a problem at which I'd give up on looking for an existing solution and reach for the Python-hammer, but Python's XML support is notoriously… well… actually, let's just leave it at "notorious".

So I'm back to looking for a pre-existing tool. My requirements are pretty simple:

  • runs on Win32 (Windows XP SP3, specifically)
  • command-line; my editor can work with just about any combination of stdin/-out/-err, arguments, temp files, etc.
  • reasonably complete XSD support (particularly namespaces and substitution groups)
  • reports the line number where the error occurred!

Does such a tool exist? I'd prefer not to have to install Visual Studio and friends (too bloated, IMO), but I do already have both Cygwin and Python installed.

like image 374
Ben Blank Avatar asked Jul 23 '09 15:07

Ben Blank


1 Answers

Your first option, xmllint (libxml2), does give line numbers for errors in the xml (and also in the xsd). You probably just need a later version. I just confirmed both using my copy, which is:

>  xmllint --version
xmllint: using libxml version 20627

Example output:

invalidXml.xml:4: element c: Schemas validity error : Element 'c': This element is not expected. Expected is ( b ).
invalidXml.xml fails to validate
<?xml version="1.0"?>
<invalidXmlEg>
  <a/>
<!--  <b></b> -->
  <c/>
</invalidXmlEg>

Where the xsd is:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="invalidXmlEg">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="a" type="xs:string" />
        <xs:element name="b" type="xs:string" />
        <xs:element name="c" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

NOTE: I have noticed that xmllint will accept elements names that it shouldn't (e.g. "<invalidXml.xsd>"), but this doesn't seem to affect your task.

EDIT adding the "compiled with" part of the version:

 compiled with: Threads Tree Output Push Reader
 Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy
 C14N Catalog XPath XPointer XInclude Iconv ISO8859X
 Unicode Regexps Automata Expr Schemas Schematron
 Modules Debug Zlib 
like image 137
13ren Avatar answered Oct 20 '22 23:10

13ren