Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the XML declaration node mandatory?

Tags:

xml

naming

I had a discussion with a colleague of mine about the XML declaration node (I'm talking about this => <?xml version="1.0" encoding="UTF-8"?>).

I believe that for something to be called "valid XML", it requires a XML declaration node.

My colleague states that the XML declaration node is optionnal, since the default encoding is UTF-8 and the version is always 1.0. This make sense, but what does the standard says ?

In short, given the following file:

<books>
  <book id="1"><title>Title</title></book>
</book>

Can we say that:

  1. It is valid XML ?
  2. It is a valid XML node ?
  3. It is a valid XML document ?

Thank you very much.

like image 734
ereOn Avatar asked Jan 13 '11 10:01

ereOn


People also ask

Is XML declaration mandatory?

The XML declaration is mandatory if the encoding of the document is anything other than UTF-8 or UTF-16. In practice, this means that documents encoded using US-ASCII can also omit the XML declaration because US-ASCII overlaps entirely with UTF-8. Only one encoding can be used for an entire XML document.

Is XML optional declaration?

XML declaration contains details that prepare an XML processor to parse the XML document. It is optional, but when used, it must appear in the first line of the XML document.

What is XML declaration used for?

The XML Declaration provides basic information about the format for the rest of the XML document. It takes the form of a Processing Instruction and can have the attributes version, encoding and standalone.


2 Answers

This:

<?xml version="1.0" encoding="UTF-8"?>

is not a processing instruction - it is the XML declaration. Its purpose is to configure the XML parser correctly before it starts reading the rest of the document.

It looks like a processing instruction, but unlike a real processing instruction it will not be part of the DOM the parser creates.

It is not necessary for "valid" XML. "Valid" means "represents a well-defined document type, as described in a DTD or a schema". Without a schema or DTD the word "valid" has no meaning.

Many people mis-use "valid" when they really mean "well-formed". A well-formed XML document is one that obeys the basic syntax rules of XML.

There is no XML declaration necessary for a document to be well-formed, either, since there are defaults for both version and encoding (1.0 and UTF-8/UTF-16, respectively). If a Unicode BOM (Byte Order Mark) is present in the file, it determines the encoding. If there is no BOM and no XML declaration, UTF-8 is assumed.

Here is a canonical thread on how encoding declaration and detection works in XML files. How default is the default encoding (UTF-8) in the XML Declaration?


To your questions:

  1. It is valid XML ?
    This cannot be answered without a DTD or a schema. It is well-formed, though.
  2. It is a valid XML node ?
    A node is a concept that is related to an in-memory representation of a document (a DOM). This snippet can be parsed into a node, since it is well-formed.
  3. It is a valid XML document ?
    See #1.

You are confusing a few XML concepts here (not to worry, this confusion is common and stems partly from the fact that the concepts overlap and names are mis-used rather often).

  • It all starts with structured data consisting of names, values and attributes that is organized as a tree.
  • XML means, most basically, a syntax to represent this structured data in textual form (it's a "Markup Language"). It is what you get when you serialize the tree into a string of characters and it can be used to de-serialize a string of characters into a tree again.
  • Document usually refers to a string of characters that represent a serialized tree. It can be stored in a file, sent over the network or created in-memory.
  • The rules of serialization and de-serialization are very strictly defined. A document (a "string of characters") that can successfully be de-serialized into a tree is said to be well-formed.
  • The semantics of such a tree (allowed elements, element count and order, namespaces, any number of complex rules, really) can be defined in what is called a DTD or a schema. If a tree obeys a certain set of well-defined semantics, it is said to be valid.
  • The term Document Object Model (DOM) refers to the standardized in-memory representation of structured data. It's the name of the a well-defined API to access this tree with standardized methods.
  • A node is the basic data structure of a Document Object Model.
like image 146
Tomalak Avatar answered Oct 16 '22 06:10

Tomalak


According to the Extensible Markup Language (XML) 1.0 (Fifth Edition) W3C Recommendation 26 November 2008, section: http://www.w3.org/TR/2008/REC-xml-20081126/#sec-prolog-dtd
without xml declaration, it is not valid (even though it is well-formed, complete).

like image 45
peenut Avatar answered Oct 16 '22 05:10

peenut