Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate an XSD schema with lxml, but ignore elements that match a given pattern?

One can use lxml to validate XML files against a given XSD schema.

Is there a way to apply this validation in a less strict sense, ignoring all elements which contain special expressions?

Consider the following example: Say, I have an xml_file:

<foo>99</foo>
<foo>{{var1}}</foo>
<foo>{{var2}}</foo>
<foo>999</foo>

Now, I run a program on this file, which replacing the {{...}}-expressions and produces a new file:

xml_file_new:

<foo>99</foo>
<foo>23</foo>
<foo>42</foo>
<foo>999</foo>

So far, I can use lxml to validate the new XML file as follows:

from lxml import etree
xml_root = etree.parse(xml_file_new)
xsd_root = etree.parse(xsd_file)
schema = etree.XMLSchema(xsd_root)
schema.validate(xml_root)

The relevant point in my example is that the schema restricts the <foo> contents to integers.

It would not be possible to apply the schema on the old xml_file in advance, however, as my program does some other expensive tasks, I would like to do exactly that while ignoring all lines containing any {{...}}-expressions:

<foo>99</foo>       <!-- should be checked-->
<foo>{{var1}}</foo> <!-- should be ignored -->
<foo>{{var2}}</foo> <!-- should be ignored -->
<foo>999</foo>      <!-- should be checked-->

EDIT: Possible solution approach: One idea would be to define two schemas

  • a strict second schema for the new file, allowing only integers
  • a relaxed schema for the old file, allowing both integers and arbitrary strings with {{..}}-expressions

However, to avoid the redundant task of keeping two schemas synchronized, one would need a way to generate the relaxed from the strict schema automatically. This sounds quite promising, as both schemas have the same structure, only differing in the restrictions of certain element contents. Is there a simple concept offered by XSD which allows to just "inherit" from one schema and then "attach" additional relaxations to individual elements?

like image 669
flonk Avatar asked Oct 14 '25 23:10

flonk


1 Answers

To answer the edited question, it is possible to compose schemas with the xs:include (and xs:import) mechanism. This way, you can declare common parts in a common schema for reuse, and use dedicated schemas for specialized type definitions, like so:

The common schema that describes the structure. Note that it uses FooType, but does not define it:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <!-- Example structure -->
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="foo" type="FooType" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

The relaxed schema to validate before the replacement. It includes the compontents from the common schema, and defines a relaxed FooType:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:include schemaLocation="common.xsd"/>

  <xs:simpleType name="FooType">
    <xs:union memberTypes="xs:integer">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:pattern value="\{\{.*\}\}"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:union>
  </xs:simpleType>

</xs:schema>

The strict schema to validate after the replacement. It defines the strict version of FooType:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:include schemaLocation="common.xsd"/>

  <xs:simpleType name="FooType">
     <xs:restriction base="xs:integer"/>
  </xs:simpleType>

</xs:schema>

For completions sake, there also are alternative ways to do this, for example with xs:redefine (XSD 1.0) or xs:override (XSD 1.1). But these have more complex semantics and personally, I try to avoid them.

like image 122
Meyer Avatar answered Oct 17 '25 13:10

Meyer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!