Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lxml: QName value does not resolve to a(n) attribute group definition

I get the following error, while trying to validate XML using a schema:

lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15

The issue is reproducing with lxml>= 6.0.0 and only on Linux (tested on Ubuntu 20 and 22).

lxml version 6.0.2 works well on Windows systems (10 and 11).

Below is a simplified example of my use case.

main.xml

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xi="http://www.w3.org/2001/XInclude">
    <title>Main XML</title>
    <elements>
        <element name="main element" foo="main foo">This text is from main.xml</element>
        <xi:include href="include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
    </elements>
</root>

include.xml

<?xml version="1.0" encoding="UTF-8"?>
<elements>
    <element name="element1" foo="foo1">Text 1: This content is included from another file.</element>
    <element name="element2" foo="foo2">Text 2: This content is included from another file.</element>
    <element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>

transform.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <!-- Identity transform: copy everything by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- Match only <message> with name="message2" and override foo -->
    <xsl:template match="element[@name='element2']">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:attribute name="foo">spam</xsl:attribute>
            <xsl:attribute name="name">message99</xsl:attribute>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

schema.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2009/01/xml.xsd"/>
    <xs:element name="root">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="title" type="xs:string"/>
                <xs:element name="elements">
                    <xs:complexType>
                        <xs:sequence minOccurs="1" maxOccurs="unbounded">
                            <xs:element name="element" minOccurs="1" maxOccurs="unbounded">
                                <xs:complexType mixed="true">
                                    <xs:attribute name="name" type="xs:string" use="required"/>
                                    <xs:attribute name="foo" type="xs:string" use="required"/>
                                    <xs:attributeGroup ref="xml:specialAttrs"/>
                                </xs:complexType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

Line 15 in schema.xsd is needed for the case when include.xml is not in the same directory as main.xml and it's referenced via a relative path.

E.g. <xi:include href="../include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>

In this case, the included elements will have an extra attribute added (xml:base): <element name="element1" foo="foo1" xml:base="../include.xml">Text 1: This content is included from another file.</element>

xmlParse.py

#!/usr/bin/env python3

import os
import lxml
from lxml import etree

print("Using lxml version {0}".format(lxml.__version__), end="\n\n")

tree = etree.parse("main.xml")
tree.xinclude()

# Apply transformations
if os.path.isfile("transform.xslt"):
    print("Applying transformation from transform.xslt")
    xslt = etree.parse("transform.xslt")
    transform = etree.XSLT(xslt)
    result = transform(tree)
    tree._setroot(result.getroot())

print(etree.tostring(tree, pretty_print=True).decode())

schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
if schema.validate(tree): # Validate
    print("XML is valid.")
else:
    print("XML is invalid!")
    for error in schema.error_log:
        print(error.message)

Below the example output from my Ubuntu 20 machine:

bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>

Traceback (most recent call last):
File "/opt/xml_parse.py", line 20, in
schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
File "src/lxml/xmlschema.pxi", line 90, in lxml.etree.XMLSchema.init
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15

bogey@machine:/opt/xml_schema$ pip install lxml==5.4.0
Defaulting to user installation because normal site-packages is not writeable
Collecting lxml==5.4.0
Downloading lxml-5.4.0-cp310-cp310-manylinux_2_28_x86_64.whl (5.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 MB 12.2 MB/s eta 0:00:00
Installing collected packages: lxml
Attempting uninstall: lxml
Found existing installation: lxml 6.0.2
Uninstalling lxml-6.0.2:
Successfully uninstalled lxml-6.0.2
Successfully installed lxml-5.4.0

bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 5.4.0
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>

XML is valid.

Output on Windows machine:

(venv310_win) PS C:\xml_schema> python .\xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>

XML is valid.

What's the deal? Any ideas would be appreciated. Thanks.

EDIT: Windows

Python : sys.version_info(major=3, minor=11, micro=8, releaselevel='final', serial=0)
etree : (6, 0, 2, 0)
libxml used : (2, 11, 9)
libxml compiled : (2, 11, 9)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)

Linux

Python : sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
etree : (6, 0, 0, 0)
libxml used : (2, 14, 4)
libxml compiled : (2, 14, 4)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)

like image 979
Bogdan Prădatu Avatar asked Nov 14 '25 15:11

Bogdan Prădatu


1 Answers

The right way

libxml2 has enforced in latest versions the use of xml catalogs to resolve external resources due to security reasons. A custom catalog could be written as follows

catalog.xml uri gets schemaLocation value and the xsd file must be downloaded <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/>

wget "http://www.w3.org/2001/xml.xsd"

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
                      "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
  <system systemId="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
  <uri name="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
</catalog>

The custom catalog.xml can be used with lxml as follows

import os
import lxml
from lxml import etree

# Path to your XML Catalog file
catalog_file = "catalog.xml"
os.environ["XML_CATALOG_FILES"] = catalog_file

print("Using lxml version {0}".format(lxml.__version__), end="\n\n")

schema_tree = etree.parse("schema.xsd")
schema = etree.XMLSchema(etree=schema_tree)

tree = etree.parse("main.xml", parser=parser)
tree.xinclude()

# Apply transformations
if os.path.isfile("transform.xslt"):
    print("Applying transformation from transform.xslt")
    xslt = etree.parse("transform.xslt")
    transform = etree.XSLT(xslt)
    result = transform(tree)
    tree._setroot(result.getroot())

print(etree.tostring(tree, pretty_print=True).decode())

if schema.validate(tree): # Validate
    print("XML is valid.")
else:
    print("XML is invalid!")
    for error in schema.error_log:
        print(error.message)

Testing the catalog with xmllint

XML_CATALOG_FILES='catalog.xml' /home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml 
main.xml validates

Running the script

python3.12 parse-so.py 
Using lxml version 6.0.0

Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
[REDACTED]

XML is valid.

Alternative: edit xsd

This answer suggests to remove schemaLocation from the xsd but that does not fix the problem. Downloading a copy of xml.xsd and referencing it in schema.xsd does the trick

wget "http://www.w3.org/2001/xml.xsd"

change schema to

<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>

Note:
latest xmllint tool from libxml2 Linux package fails with the same error so it's not an lxml bug

/home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml
I/O warning : failed to load "https://www.w3.org/2005/08/xml.xsd": No such file or directory
schema.xsd:3: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'https://www.w3.org/2005/08/xml.xsd'. Skipping the import.
schema.xsd:15: element attributeGroup: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition.
WXS schema schema.xsd failed to compile

It works when referencing a local xsd file

<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>

/home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml 
main.xml validates
like image 175
LMC Avatar answered Nov 17 '25 05:11

LMC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!