I had a discussion with a colleague of mine about the XML declaration node (I'm talking about this => <?xml version="1.0" encoding="UTF-8"?>
).
I believe that for something to be called "valid XML", it requires a XML declaration node.
My colleague states that the XML declaration node is optionnal, since the default encoding is UTF-8 and the version is always 1.0
. This make sense, but what does the standard says ?
In short, given the following file:
<books>
<book id="1"><title>Title</title></book>
</book>
Can we say that:
Thank you very much.
The XML declaration is mandatory if the encoding of the document is anything other than UTF-8 or UTF-16. In practice, this means that documents encoded using US-ASCII can also omit the XML declaration because US-ASCII overlaps entirely with UTF-8. Only one encoding can be used for an entire XML document.
XML declaration contains details that prepare an XML processor to parse the XML document. It is optional, but when used, it must appear in the first line of the XML document.
The XML Declaration provides basic information about the format for the rest of the XML document. It takes the form of a Processing Instruction and can have the attributes version, encoding and standalone.
This:
<?xml version="1.0" encoding="UTF-8"?>
is not a processing instruction - it is the XML declaration. Its purpose is to configure the XML parser correctly before it starts reading the rest of the document.
It looks like a processing instruction, but unlike a real processing instruction it will not be part of the DOM the parser creates.
It is not necessary for "valid" XML. "Valid" means "represents a well-defined document type, as described in a DTD or a schema". Without a schema or DTD the word "valid" has no meaning.
Many people mis-use "valid" when they really mean "well-formed". A well-formed XML document is one that obeys the basic syntax rules of XML.
There is no XML declaration necessary for a document to be well-formed, either, since there are defaults for both version
and encoding
(1.0
and UTF-8
/UTF-16
, respectively). If a Unicode BOM (Byte Order Mark) is present in the file, it determines the encoding. If there is no BOM and no XML declaration, UTF-8 is assumed.
Here is a canonical thread on how encoding declaration and detection works in XML files. How default is the default encoding (UTF-8) in the XML Declaration?
To your questions:
You are confusing a few XML concepts here (not to worry, this confusion is common and stems partly from the fact that the concepts overlap and names are mis-used rather often).
According to the Extensible Markup Language (XML) 1.0 (Fifth Edition)
W3C Recommendation 26 November 2008, section:
http://www.w3.org/TR/2008/REC-xml-20081126/#sec-prolog-dtd
without xml declaration, it is not valid (even though it is well-formed, complete).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With