Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set up catalog files for xmllint?

Ok. I want to set up catalog files for xmllint to fix things so that the dcterms xml namespace is validated from a local document. I believe that I have done everything right, but it simply doesn't seem to be working.

I am running OSX.

I have created a directory /etc/xml

$ mkdir /etc/xml
$ cd /etc/xml

I have downloaded dcterms.xsd to that directory

$ ls -l
-rw-r--r--  1 ibis  wheel  12507 24 Jul 11:42 dcterms.xsd

I have created a file named "catalog"

$ xmlcatalog --create > catalog

I have added the dcterms namespace to the catalog file

$ xmlcatalog --noout --add uri http://purl.org/dc/elements/1.1/ file:///etc/xml/dc.xsd
$ xmlcatalog --noout --add uri http://purl.org/dc/terms/ file:///etc/xml/dcterms.xsd
$ cat catalog
<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <uri name="http://purl.org/dc/elements/1.1/" uri="file:///etc/xml/dc.xsd"/>
  <uri name="http://purl.org/dc/terms/" uri="file:///etc/xml/dcterms.xsd"/>
</catalog>

In a work directory, I have created a simple xml schema named Empty.xsd

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/Empty" xmlns:tns="http://www.example.org/Empty" elementFormDefault="qualified">
  <element name="empty">
    <complexType>
      <sequence>
        <any processContents="strict" minOccurs="0" maxOccurs="unbounded"/>
      </sequence>
      <anyAttribute></anyAttribute>
    </complexType>
  </element>
</schema>

Note that the processcontnts is "strict".

I have created an XML file which should trigger all the validation:

<?xml version="1.0" encoding="UTF-8"?>
<empty xmlns="http://www.example.org/Empty" 
          xmlns:dcterms="http://purl.org/dc/terms/">
    <dcterms:title>A title</dcterms:title>
</empty>

Then I attempt to validate it.

$ xmllint --noout --valid --schema Empty.xsd Empty.xml
Empty.xml:2: validity error : Validation failed: no DTD found !
y xmlns="http://www.example.org/Empty" xmlns:dcterms="http://purl.org/dc/terms/"
                                                                               ^
Empty.xml:3: element title: Schemas validity error : Element '{http://purl.org/dc/terms/}title': No matching global element declaration available, but demanded by the strict wildcard.
Empty.xml fails to validate

I have set up a catalog as specified in the docs and pointed it at the local dcterms schema file. Why does xmllint fail to find it?

like image 462
PaulMurrayCbr Avatar asked Oct 07 '22 03:10

PaulMurrayCbr


1 Answers

Program xmllint doesn't auto-load XSD-files based on xmlns="something" attributes found in the to-be-parsed XML-file, it only uses the XSD specified in --schema parameter (and the ones imported/included from that).

For the test, you could create a NonEmpty.xsd like this:

<?xml version="1.0" encoding="UTF-8"?>
<schema 
    xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.org/Empty"
    elementFormDefault="qualified">
  <include schemaLocation="Empty.xsd"/>
  <import schemaLocation="dcterms.xsd" namespace="http://purl.org/dc/terms/"/>
</schema>

Usage:

$ xmllint -debugent -noout -schema NonEmpty.xsd Empty.xml
new input from file: NonEmpty.xsd
new input from file: Empty.xsd
new input from file: dcterms.xsd
new input from file: http://www.w3.org/2001/03/xml.xsd
new input from file: dc.xsd
new input from file: dcmitype.xsd
new input from file: Empty.xml
Empty.xml validates

Now with catalog file:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <uri name="http://www.w3.org/2001/03/xml.xsd"          uri="file:///home/zsiga/proba/dcterms/2001_03_xml.xsd"/>
  <uri name="http://dublincore.org/schemas/xmls/qdc/dcterms.xsd" uri="file:///home/zsiga/proba/dcterms/dcterms.xsd"/>
</catalog>

Here's the NonEmpty2.xsd file:

<?xml version="1.0" encoding="UTF-8"?>
<schema 
    xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.org/Empty"
    elementFormDefault="qualified">
  <include schemaLocation="Empty.xsd"/>
  <import schemaLocation="http://dublincore.org/schemas/xmls/qdc/dcterms.xsd" namespace="http://purl.org/dc/terms/"/>
</schema>

And its usage:

$ XML_CATALOG_FILES=./catalog xmllint -debugent -noout \
    -schema NonEmpty2.xsd Empty.xml
new input from file: NonEmpty2.xsd
new input from file: Empty.xsd
new input from file: file:///home/zsiga/proba/dcterms/dcterms.xsd
new input from file: file:///home/zsiga/proba/dcterms/2001_03_xml.xsd
new input from file: file:///home/zsiga/proba/dcterms/dc.xsd
new input from file: file:///home/zsiga/proba/dcterms/dcmitype.xsd
new input from file: Empty.xml
Empty.xml validates

--- Edit 2020.11.02. ---

I would like to suggest using <systemId> tag in catalog, also using relative path-names:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <system systemId="http://www.w3.org/2001/03/xml.xsd"                  uri="2001_03_xml.xsd"/>
  <system systemId="http://dublincore.org/schemas/xmls/qdc/dcterms.xsd" uri="dcterms.xsd"/>
</catalog>

The result is the same, but some programs prefer <system> over <uri>. also relative path-names [relative to the location of the catalog file] might be easier to handle.

like image 93
Lorinczy Zsigmond Avatar answered Oct 13 '22 10:10

Lorinczy Zsigmond