...or Why do these files validate in Visual Studio 2010 but not with xmllint1?
I'm currently working against a published xml schema where the original author's habit is to break down the schemas into several .xsd-files, but where some schema files have the same targetNamespace
. Is this really "allowed"?
Example (extremely simplified):
File targetNamespace Contents ------------------------------------------------------------ b1.xsd uri:tempuri.org:b complex type "fooType" b2.xsd uri:tempuri.org:b simple type "barType" a.xsd uri:tempuri.org:a imports b1.xsd and b2.xsd definition of root element "foo", that extends "b:fooType" with an attribute of "b:barType"
(Complete file contents below.)
Then I have an xml file, data.xml
, with this content:
<?xml version="1.0"?> <foo bar="1" xmlns="uri:tempuri.org:a" xmlns:xs="http://www.w3.org/2001/XMLSchema" />
For a long time, I have believed that all of this was correct, since Visual Studio apparently allows this schema style. However, today I decided to set up a command line utility for validating xml files, and I chose xmllint
.
When I ran xmllint --schema a.xsd data.xml
, I was presented with this warning:
a.xsd:4: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import': Skipping import of schema located at 'b2.xsd' for the namespace 'uri:tempuri.org:b', since this namespace was already imported with the schema located at 'b1.xsd'.
The fact that the import of b2.xsd
was skipped obviously leads to this error:
a.xsd:9: element attribute: Schemas parser error : attribute decl. 'bar', attribute 'type': The QName value '{uri:tempuri.org:b}barType' does not resolve to a(n) simple type definition.
If xmllint
is correct, there would be an error in the published specs I'm working against. Is there? And Visual Studio would be wrong. Is it?
I do realize the difference between xs:import
and xs:include
. Right now, I just don't see how xs:include
could fix things, since:
b1.xsd
and b2.xsd
have the same targetNamespace
targetNamespace
from a.xsd
Is this a flaw in the original schema specification? I'm beginning to think that the third bullet point is crucial. Should the fact that they don't know about each other have led to placing them in different namespaces to begin with?
b1.xsd:
<?xml version="1.0" encoding="utf-8"?> <xs:schema targetNamespace="uri:tempuri.org:b" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="fooType" /> </xs:schema>
b2.xsd:
<?xml version="1.0" encoding="utf-8"?> <xs:schema targetNamespace="uri:tempuri.org:b" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="barType"> <xs:restriction base="xs:integer" /> </xs:simpleType> </xs:schema>
a.xsd:
<?xml version="1.0" encoding="utf-8"?> <xs:schema targetNamespace="uri:tempuri.org:a" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:b="uri:tempuri.org:b"> <xs:import namespace="uri:tempuri.org:b" schemaLocation="b1.xsd" /> <xs:import namespace="uri:tempuri.org:b" schemaLocation="b2.xsd" /> <xs:element name="foo"> <xs:complexType> <xs:complexContent> <xs:extension base="b:fooType"> <xs:attribute name="bar" type="b:barType" /> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element> </xs:schema>
Notes:
1) I'm using the Windows port of libxml2/xmllint found at www.zlatkovic.com.
Schemas can be composed of one or more XML documents. These schema documents can be explicitly joined together using the include and import elements.
When you use multiple namespaces in an XML document, you can define one namespace as the default namespace to create a cleaner looking document. The default namespace is declared in the root element and applies to all unqualified elements in the document. Default namespaces apply to elements only, not to attributes.
One of the primary motivations for defining an XML namespace is to avoid naming conflicts when using and re-using multiple vocabularies. XML Schema is used to create a vocabulary for an XML instance, and uses namespaces heavily.
When using prefixes in XML, a namespace for the prefix must be defined. The namespace can be defined by an xmlns attribute in the start tag of an element. The namespace declaration has the following syntax. xmlns:prefix="URI".
The crux of the problem here is what does it mean when you have two different <import>
elements, when both of them refer to the same namespace.
It helps to clarify the meaning when you consider that the schemaLocation
attribute of <import>
is entirely optional. When you leave it out, you're just saying "I want to import schema of namespace XYZ into this schema". The schemaLocation
is just a hint as to where to find the definition of that other schema.
The precise meaning of <import>
is a bit fuzzy when you read the W3C spec, possibly deliberately so. As a result, interpretations vary.
Some XML processors tolerate multiple <import>
for the same namespace, and essentially amalgamate all of the schemaLocation
into a single target.
Other processors are stricter, and decide that only one <import>
per target namespace is valid. I think this is more correct, when you consider that schemaLocation
is optional.
In addition to the VS and xmllint examples you gave, Xerces-J is also super-strict, and ignores subsequent <import>
for the same target namespace, giving much the same error as xmllint does. XML Spy, on the other hand, is much more permissive (but then, XML Spy's validation is notoriously flaky)
To be safe, you should not have these multiple imports. A given namespace should have a single "master" document, which in turn has an <include>
for each sub-document. This master is often highly artificial, acting only as a container. for these sub-documents.
From what I've seen, this generally consists of "best practise" for XML Schema when it comes to maximum tool compatibility, but some will argue that it's a hack that takes away from elegant schema design.
Meh.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With