The XML declaration is mandatory if the encoding of the document is anything other than UTF-8 or UTF-16. In practice, this means that documents encoded using US-ASCII can also omit the XML declaration because US-ASCII overlaps entirely with UTF-8. Only one encoding can be used for an entire XML document.
Well Formed XML DocumentsXML documents must have a root element. XML elements must have a closing tag. XML tags are case sensitive. XML elements must be properly nested.
An XML declaration is a processing instruction that identifies a document as XML. DITA documents must begin with an XML declaration.
xml version = "version_number" encoding = "encoding_declaration" standalone = "standalone_status" ?> Specifies the version of the XML standard used. It defines the character encoding used in the document. UTF-8 is the default encoding used.
In XML 1.0, the XML Declaration is optional. See section 2.8 of the XML 1.0 Recommendation, where it says it "should" be used -- which means it is recommended, but not mandatory. In XML 1.1, however, the declaration is mandatory. See section 2.8 of the XML 1.1 Recommendation, where it says "MUST" be used. It even goes on to state that if the declaration is absent, that automatically implies the document is an XML 1.0 document.
Note that in an XML Declaration the encoding
and standalone
are both optional. Only the version
is mandatory. Also, these are not attributes, so if they are present they must be in that order: version
, followed by any encoding
, followed by any standalone
.
<?xml version="1.0"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" standalone="yes"?>
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
If you don't specify the encoding in this way, XML parsers try to guess what encoding is being used. The XML 1.0 Recommendation describes one possible way character encoding can be autodetected. In practice, this is not much of a problem if the input is encoded as UTF-8, UTF-16 or US-ASCII. Autodetection doesn't work when it encounters 8-bit encodings that use characters outside the US-ASCII range (e.g. ISO 8859-1) -- avoid creating these if you can.
The standalone
indicates whether the XML document can be correctly processed without the DTD or not. People rarely use it. These days, it is a bad to design an XML format that is missing information without its DTD.
Update:
A "prolog error/invalid utf-8 encoding" error indicates that the actual data the parser found inside the file did not match the encoding that the XML declaration says it is. Or in some cases the data inside the file did not match the autodetected encoding.
Since your file contains a byte-order-mark (BOM) it should be in UTF-16 encoding. I suspect that your declaration says <?xml version="1.0" encoding="UTF-8"?>
which is obviously incorrect when the file has been changed into UTF-16 by NotePad. The simple solution is to remove the encoding
and simply say <?xml version="1.0"?>
. You could also edit it to say encoding="UTF-16"
but that would be wrong for the original file (which wasn't in UTF-16) or if the file somehow gets changed back to UTF-8 or some other encoding.
Don't bother trying to remove the BOM -- that's not the cause of the problem. Using NotePad or WordPad to edit XML is the real problem!
Xml declaration is optional so your xml is well-formed without it. But it is recommended to use it so that wrong assumptions are not made by the parsers, specifically about the encoding used.
It is only required if you aren't using the default values for version
and encoding
(which you are in that example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With