Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Schema Header & Namespace Config

Tags:

xml

schema

xsd

Migrating from DTD to XSD and for some reason the transition is a bumpy one. I understand how to define the schema once I'm inside the <xs:schema> root tag, but getting past the header & namespace declaration stuff is proving to be especially confusing for me.

I have been trying to follow the well-laid out tutorial on W3S but even that tutorial seems to assume a lot of knowledge up front.

I guess what I'm looking for is a King's English explanation of which attributes do what, where they go, and why:

  • xmlns
  • xmlns:xs
  • xmlns:xsi
  • targetNamespace
  • xsi:schemaLocation

And in some cases I see different variations of these elements/attributes, such as xsi which seems to have two different notations like xsi:schemaLocation="..." and xs:import schemaLocation="...".

I guess between all these slight variations I can't seem to make heads or tails of what each of these does. Thanks in advance for bringing any clarity to this confusion!

like image 673
IAmYourFaja Avatar asked Nov 22 '11 15:11

IAmYourFaja


People also ask

What is XML Schema with example?

XML schema is a language which is used for expressing constraint about XML documents. There are so many schema languages which are used now a days for example Relax- NG and XSD (XML schema definition). An XML schema is used to define the structure of an XML document.

What is xmlns in XML Schema?

In the attribute xmlns:pfx, xmlns is like a reserved word, which is used only to declare a namespace. In other words, xmlns is used for binding namespaces, and is not itself bound to any namespace. Therefore, the above example is read as binding the prefix "pfx" with the namespace "http://www.foo.com."

What does an XML Schema contain?

An XML schema defines elements and their structures. It also defines the attributes and their data types. The elements' structures can be of simpleType or complexType , depending on whether the element is a leaf element or a parent element.

What is the structure of XML Schema?

An XML Schema describes the structure of an XML document, just like a DTD. An XML document with correct syntax is called "Well Formed". An XML document validated against an XML Schema is both "Well Formed" and "Valid".


1 Answers

The first thing you'll need to understand are XML namespaces. If you have some time to waste, you could read the specification. I found this to be one of the clearer specs related to XML. It doesn't matter if you don't understand everything it says, it's a good basis. But here's a quick rundown.

XML elements and attributes have a name. When you see <test att="hello"/>, you're looking at an element with name "test", in which we have an attribute with name "att". But this isn't really the entire story...

XML is a syntax that allows you to mix content from different markup languages. For example, when using XSLT to turn an XML document into an XHTML page, you're dealing with at least three markup languages defined in XML: your input document's, XSLT and XHTML. Such mixes would become rather hard if each one reserved its own element/attribute names and no collisions were ever allowed.

Enter XML namespaces. An XML namespace defines a "sphere" within which element and attribute names have actual semantics. The element "template" has a well-defined meaning in the XSLT namespace. The element "complexType" has a well-defined meaning in the XML Schema namespace. If you wish to use either in your own markup language using XML, then that's possible provided you do so in a different namespace.

In order to make sure a namespace is unique, you'll need to provide some unique identifier. The specification settled on the use of URIs, most often in the form of a HTTP URL. The reason for this is simple: such URLs tend to be good unique identifiers. But it's also a very common cause for confusion because people think the URLs really hold meaning or will be accessed over the network during XML processing. Know very well that this is not the case! The URL is not required to point to any existing resource. It will not go through any transformation or be resolved to a network address. Even if two URLs would point to the exact same thing, the moment they differ by one character they are considered different namespaces. A namespace identifier is just a string, and a case-sensitive one at that. Nothing more.

With the introduction of namespaces, the name of an XML element or attribute suddenly consists of two parts: a namespace and a local name. That "test" in <test/> is only the local name. The so-called "fully-qualified name" consists of a kinda invisible combination of the namespace and the local name. Sometimes the notation {namespace URI}local-name is used, but that's nothing more than convention.

So now we need to be able to use namespaces in an XML document. In order to declare a namespace, XML has a hard-coded mechanism. It uses the special string xmlns to allow namespace declarations to be made. It can be done in one of two ways: binding the namespace to a prefix, or declaring it as the default namespace.

When binding to a prefix, the form is something like this: xmlns:prefix="namespace URI". Here's an example in an XML document:

<foo:root xmlns:foo="http://www.foo.com">
    <foo:test />
</foo:root>

We've now bound namespace http://www.foo.com to the prefix foo. Wherever this prefix is put in front of an element or attribute's name, we're stating that they're part of that namespace.

What's very important to note here is that the actual prefix means absolutely nothing. The following XML document is semantically the exact same:

<bar:root xmlns:bar="http://www.foo.com">
    <bar:test />
</bar:root>

The prefix is merely a convenient way of representing the namespace. It saves us from having to use the URI entirely every time.

Next up is the default namespace. A default namespace can be declared with xmlns="namespace URI". You could abstractly think of this as binding a namespace to the empty prefix. Once again the same XML document, but this time without prefixes:

<root xmlns="http://www.foo.com">
    <test />
</root>

This is a bit more convenient to work with. So why have prefixes at all? They start playing a role when we're mixing content from different namespaces:

<root xmlns="http://www.foo.com">
    <so:test xmlns:so="http://stackoverflow.com" />
</root>

This time it's a different XML document. Our root element lies in the http://www.foo.com namespace, but the test element lies in http://stackoverflow.com because we've bound to so prefix to that and used it on test.

You also notice here that namespaces can be declared on any element in the XML document. The scope of that declaration (and binding to prefix if applicable) then becomes that element and its content.

This can sometimes become confusing, even more so since declarations may override each other. Check this document:

<root xmlns="http://www.foo.com">
    <test />
    <so:test xmlns:so="http://www.stackoverflow.com" xmlns="http://www.bar.com">
        <test />
    </so:test>
</root>

Take a moment and figure out what namespace each element is in... It's a good excercise.

root is in namespace http://www.foo.com. The first test element is also in that namespace, since we haven't used a prefix but we're in the scope of that default namespace. The second test element with prefix so lies in namespace http://www.stackoverflow.com because that's what we bound the prefix to.

So then there's the third, innermost test element. What namespace is it in? It doesn't have a prefix, so it must be in the default namespace. BUT, we've changed our default namespace in the second test element! So now that innermost element belongs to the http://www.bar.com namespace, not http://www.foo.com.

Confused yet? Just remember the following:

  • A namespace is just a string. URIs are used as a convenient way to have unique identifiers.
  • A prefix is used to represent a namespace, but its name holds no meaning whatsoever. Just think of it as a placeholder.
  • You can set a default namespace. All elements within its scope that don't have a prefix then become part of it.

Phew. Now, onto W3C XML Schema. How does all of this relate to it?

Well, for starters XML Schema itself is a markup language defined in XML. So it stands to reason that it gets its own namespace. And that namespace is officially http://www.w3.org/2001/XMLSchema. If you write that S as lower case, it's wrong. Starting to see why some people really hate namespaces?

The following three documents are exactly the same:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</xsd:schema>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
</xs:schema>

<schema xmlns="http://www.w3.org/2001/XMLSchema">
</schema>

All that matters is that we're using stuff from the XML Schema namespace. As a convention, however, people tend to use prefix xs or xsd in XML Schemas.

When we have an XML document, we may wish to specify where the schema(s) for it are located. More than one schema can be relevant for an XML document, because as we've stated languages can be mixed in XML. In order to say that an XML document is an instance of a schema, once again there's a special namespace available: http://www.w3.org/2001/XMLSchema-instance. By convention, we tend to bind this namespace to prefix xsi. But again, this isn't mandatory.

There's a couple of attributes defined in that schema instance namespace. Among them are schemaLocation and noNamespaceSchemaLocation. Take a look at this document:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.foo.com/schema">
</root>

What happend there? First we stated that we're binding prefix xsi to the namespace http://www.w3.org/2001/XMLSchema-instance. Then we're used an attribute within that namespace: noNamespaceSchemaLocation. That attribute tells us where the schema is located to validate those parts of the document that aren't in any particular namespace. The following XML document is exactly the same, semantically:

<root xmlns:huh="http://www.w3.org/2001/XMLSchema-instance" huh:noNamespaceSchemaLocation="http://www.test.com/schema">
</root>

Remember, prefix names have no meaning. They're placeholders. So, what's with that noNamespaceSchemaLocation attribute? Basically, it tells us where we can locate a schema. Now contrary to a namespace URI, this is most definitely something that can be used to fetch stuff from a network or local storage. An XML processor that validates against a schema declared in a document might try to obtain it.

Then there's the fact that it's called noNamespaceSchemaLocation. A schema defines a "target namespace". What this does is state what namespace the elements and attributes it defines are part of. But the target namespace may be omitted. In that case, we've got a schema for XML documents without namespace. Such a schema can be referred to with noNamespaceSchemaLocation.

In many cases, a schema will actually define a namespace. In order to say which schema belongs with which namespace, we can use another attribute from the http://www.w3.org/2001/XMLSchema-instance namespace: schemaLocation. That attribute can contain pairs (separated by spaces) of namespace URIs and schema URIs. Suppose we have schema for namespace http://www.foo.com located at http://www.myschemas.com/foo-schema. Then we can state that as follows:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.foo.com http://www.myschemas.com/foo-schema">
</root>

Here's an example with multiple namespace-location pairs:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.foo.com http://www.myschemas.com/foo-schema http://www.bar.com http://www.randomschemas.com/bar-schema">
</root>

What you need to remember here is that http://www.w3.org/2001/XMLSchema-instance stuff is for use in XML documents that are instances of schemas. The namespace http://www.w3.org/2001/XMLSchema is the one used for defining schemas themselves.


So by now we're up to our neck in URIs and weird-looking attributes with special meanings. That's the thing with namespaces: they look very complex until you figure out how simple they are. Just keep a close eye on what prefix is bound to what namespace URI, and know what that URI defines.

There's two more things about schemas I need to address for your question: xs:import and xs:include. Notice how I've used the xs prefix convention here, since we're talking about W3C XML Schema.

The include element can be used to combine schemas with the same target namespace. Basically it allows us to modularize schemas into smaller parts and put them together.

The import element does sort of the same, but for schemas with different target namespaces. This allows us to combine schemas for different markup languages.


So to recap:

  • xmlns: used to specify a default namespace.
  • xmlns:prefix: used to bind a namespace to prefix.
  • http://www.w3.org/2001/XMLSchema: the namespace for XML Schema. By convention often bound to prefix xs, but this is not mandatory nor is it done automatically.
  • http://www.w3.org/2001/XMLSchema-instance: the namespace that defines a bunch of things useful for declaring the details of how an XML document is an instance of a schema. By convention often bound to prefix xsi, but this is not mandatory nor is it done automatically.
  • targetNamespace: an attribute that can be used in XML Schema (on the root element) to specify for which namespace this is a schema definition.
  • schemaLocation: one of the attributes defined by namespace http://www.w3.org/2001/XMLSchema-instance, used to indicate where one or more schemas can be found for one or more namespaces.

My final advice: find some convenient way to validate documents against schemas and play around a bit. Experiment with namespaces, includes and imports. Make documents using multiple namespaces and try out the scoping.

After that, check the specifications of XML itself, XML namespaces and XML Schema. It's hardcore reading, but if you make your way through it you'll gain an understanding that many people still seem to miss after years of using XML. Eventually it'll all make sense.

Good luck!

like image 101
G_H Avatar answered Oct 24 '22 17:10

G_H