I get the following error, while trying to validate XML using a schema:
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15
The issue is reproducing with lxml>= 6.0.0 and only on Linux (tested on Ubuntu 20 and 22).
lxml version 6.0.2 works well on Windows systems (10 and 11).
Below is a simplified example of my use case.
main.xml
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<xi:include href="include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
</elements>
</root>
include.xml
<?xml version="1.0" encoding="UTF-8"?>
<elements>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element>
<element name="element2" foo="foo2">Text 2: This content is included from another file.</element>
<element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
transform.xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Identity transform: copy everything by default -->
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<!-- Match only <message> with name="message2" and override foo -->
<xsl:template match="element[@name='element2']">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:attribute name="foo">spam</xsl:attribute>
<xsl:attribute name="name">message99</xsl:attribute>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
schema.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2009/01/xml.xsd"/>
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="elements">
<xs:complexType>
<xs:sequence minOccurs="1" maxOccurs="unbounded">
<xs:element name="element" minOccurs="1" maxOccurs="unbounded">
<xs:complexType mixed="true">
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="foo" type="xs:string" use="required"/>
<xs:attributeGroup ref="xml:specialAttrs"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Line 15 in schema.xsd is needed for the case when include.xml is not in the same directory as main.xml and it's referenced via a relative path.
E.g. <xi:include href="../include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
In this case, the included elements will have an extra attribute added (xml:base):
<element name="element1" foo="foo1" xml:base="../include.xml">Text 1: This content is included from another file.</element>
xmlParse.py
#!/usr/bin/env python3
import os
import lxml
from lxml import etree
print("Using lxml version {0}".format(lxml.__version__), end="\n\n")
tree = etree.parse("main.xml")
tree.xinclude()
# Apply transformations
if os.path.isfile("transform.xslt"):
print("Applying transformation from transform.xslt")
xslt = etree.parse("transform.xslt")
transform = etree.XSLT(xslt)
result = transform(tree)
tree._setroot(result.getroot())
print(etree.tostring(tree, pretty_print=True).decode())
schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
if schema.validate(tree): # Validate
print("XML is valid.")
else:
print("XML is invalid!")
for error in schema.error_log:
print(error.message)
Below the example output from my Ubuntu 20 machine:
bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>Traceback (most recent call last):
File "/opt/xml_parse.py", line 20, in
schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
File "src/lxml/xmlschema.pxi", line 90, in lxml.etree.XMLSchema.init
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15bogey@machine:/opt/xml_schema$ pip install lxml==5.4.0
Defaulting to user installation because normal site-packages is not writeable
Collecting lxml==5.4.0
Downloading lxml-5.4.0-cp310-cp310-manylinux_2_28_x86_64.whl (5.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 MB 12.2 MB/s eta 0:00:00
Installing collected packages: lxml
Attempting uninstall: lxml
Found existing installation: lxml 6.0.2
Uninstalling lxml-6.0.2:
Successfully uninstalled lxml-6.0.2
Successfully installed lxml-5.4.0bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 5.4.0
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>XML is valid.
Output on Windows machine:
(venv310_win) PS C:\xml_schema> python .\xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>XML is valid.
What's the deal? Any ideas would be appreciated. Thanks.
EDIT: Windows
Python : sys.version_info(major=3, minor=11, micro=8, releaselevel='final', serial=0)
etree : (6, 0, 2, 0)
libxml used : (2, 11, 9)
libxml compiled : (2, 11, 9)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
Linux
Python : sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
etree : (6, 0, 0, 0)
libxml used : (2, 14, 4)
libxml compiled : (2, 14, 4)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)
libxml2 has enforced in latest versions the use of xml catalogs to resolve external resources due to security reasons. A custom catalog could be written as follows
catalog.xml
uri gets schemaLocation value and the xsd file must be downloaded
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/>
wget "http://www.w3.org/2001/xml.xsd"
<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="http://www.w3.org/2001/xml.xsd"
uri="xml.xsd"/>
<system systemId="http://www.w3.org/2001/xml.xsd"
uri="xml.xsd"/>
<uri name="http://www.w3.org/2001/xml.xsd"
uri="xml.xsd"/>
</catalog>
The custom catalog.xml can be used with lxml as follows
import os
import lxml
from lxml import etree
# Path to your XML Catalog file
catalog_file = "catalog.xml"
os.environ["XML_CATALOG_FILES"] = catalog_file
print("Using lxml version {0}".format(lxml.__version__), end="\n\n")
schema_tree = etree.parse("schema.xsd")
schema = etree.XMLSchema(etree=schema_tree)
tree = etree.parse("main.xml", parser=parser)
tree.xinclude()
# Apply transformations
if os.path.isfile("transform.xslt"):
print("Applying transformation from transform.xslt")
xslt = etree.parse("transform.xslt")
transform = etree.XSLT(xslt)
result = transform(tree)
tree._setroot(result.getroot())
print(etree.tostring(tree, pretty_print=True).decode())
if schema.validate(tree): # Validate
print("XML is valid.")
else:
print("XML is invalid!")
for error in schema.error_log:
print(error.message)
Testing the catalog with xmllint
XML_CATALOG_FILES='catalog.xml' /home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml
main.xml validates
Running the script
python3.12 parse-so.py
Using lxml version 6.0.0
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
[REDACTED]
XML is valid.
This answer suggests to remove schemaLocation from the xsd but that does not fix the problem. Downloading a copy of xml.xsd and referencing it in schema.xsd does the trick
wget "http://www.w3.org/2001/xml.xsd"
change schema to
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>
Note:
latest xmllint tool from libxml2 Linux package fails with the same error so it's not an lxml bug
/home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml
I/O warning : failed to load "https://www.w3.org/2005/08/xml.xsd": No such file or directory
schema.xsd:3: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'https://www.w3.org/2005/08/xml.xsd'. Skipping the import.
schema.xsd:15: element attributeGroup: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition.
WXS schema schema.xsd failed to compile
It works when referencing a local xsd file
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>
/home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml
main.xml validates
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With