Looking at the XML header
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
Am I right to state that the encoding
attribute is
Or is that attribute not about the content of the stream?
Am I mixing up things here?
Encoding plays a role in XML as the user needs to provide a correct encoding while transferring XML Documents on different platforms. With respective to XML 1.0 specification, the two Unicode UTF -8 and 16 must be supported in the processor automatically.
version="1.0" means that this is the XML standard this file conforms to. encoding="utf-8" means that the file is encoded using the UTF-8 Unicode encoding.
You can write the XML file in any text editor. For non-ASCII characters, such as characters with diacritics and Kanji characters, an editor that can save the file as UTF-8 is required. Because UTF-8 is not easily displayed or edited on z/OS®, the XML can be encoded in UTF-8 or using the agent's code page.
UTF-8 is the default character encoding for XML documents. Character encoding can be studied in our Character Set Tutorial. UTF-8 is also the default encoding for HTML5, CSS, JavaScript, PHP, and SQL.
As you mentioned, you'd have to know the encoding of the file to read the encoding
attribute.
However, there is a heuristic that can easily get you close enough to the "real" encoding to allow you to read the encoding attribute. This works, because the <?xml
part by definition can only contain characters in the ASCII range (however they are encoded).
The XML standard even describes the exact process used to find out the encoding.
And the encoding label isn't redundant either. For example, if you use the algorithm in the XML spec to find out that some ASCII-based (or ASCII-compatible) encoding is used you still need to read the encoding to find out which one is actually use (valid candidates would be ASCII, UTF-8, any of the ISO-8859-* encodings, any of the Windows-* encodings, KOI8-R and many, many others). For the <?xml
part itself it won't make a difference which one it is, but for the rest of the document, it can make a huge difference.
Regarding mis-labeled XML files: yes, it's easy to produce those, however: the XML spec clearly specifies that those files are mal-formed and as such are not correct XML. Incorrect encodings must be reported as an error (as long as they can be detected!). So it's the problem of whoever is producing the XML.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With