Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

confused about xhtml5: no more `<?xml?>` and now mandatory `meta`?

I've been a longtime user of XHTML 1.0 Strict, and I'm now trying to switch to XHTML5 in my new projects.

I'm confused that <?xml version='1.0' encoding='utf-8'?> is no longer considered valid, for HTML5, by http://validator.w3.org/. Why is that? Isn't that what all xml documents are supposed to start with?

And when I remove the standard <?xml…, my document still doesn't validate: now it's missing the encoding. I don't like those meta tags, but are they now effectively mandatory, to specify the encoding, in order to be valid (X)HTML5?

like image 908
cnst Avatar asked May 17 '13 02:05

cnst


People also ask

What is XHTML5?

XHTML5 is a document that conforms to both the HTML and XHTML syntax by using a common subset of both the HTML and XHTML. To code XHTML5 you need to: Use the HTML5 doctype. Code in XHTML well-formed syntax. Default XHTML namespace: <html xmlns="http://www.w3.org/1999/xhtml">

Why is HTML5 not XML?

There is an XML serialization called XHTML5, but for backwards-compatibility purposes with IE browsers, it is not recommended to be used. So technically, HTML5 is not considered to be well-formed XML. Polyglot is no longer maintained and not good standard (Beware.

Is HTML5 the same as XML?

HTML 5 can be written in html and XML. HTML 5 specification is the description of a vocabulary that you can write in two different syntaxes (html and XML) depending on your developer needs, markets and applications.

Is XML compatible with HTML?

XML is Often a Complement to HTML In many HTML applications, XML is used to store or transport data, while HTML is used to format and display the same data.


1 Answers

An XML declaration is valid and validates in XHTML serialization of HTML5. The following rather minimal document validates:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title></head>
<body></body>
</html>

However, this only applies to XHTML serialization (XHTML syntax) of HTML5. In HTML serialization, it is not allowed. If you write the above document in a file and store it in a server that will send it with Content-Type: text/html (which normally happens if the filename ends with “.html”), then you get an error message:

Saw <?. Probable cause: Attempt to use an XML processing instruction in HTML.
(XML processing instructions are not supported in HTML.)

Here “HTML” means HTML serialization only.

Browsers do not care about an XML declaration in either syntax. In HTML syntax, it is just ignored, as a recoverable syntax error. In XHTML syntax, it does not matter, except for the encoding part.

Although XML 1.0 specification recommends (but does not require) an XML declaration, it would in practice matter (apart from the significance of encoding) only to software that is capable of processing different versions of XML. Browsers aren’t. And in addition to XML 1.0, there’s just XML 1.1, which is not used much. Besides, HTML5 is defined so that the XML version used in XHTML syntax is XML 1.0.

The encoding part may matter, but utf-8 is the default anyway for XML. If you use another encoding for some reason, then an XML declaration may be useful to prevent any conflicts. HTML5 CR says this in it discussion of encodings: “In XHTML, the XML declaration should be used for inline character encoding information, if necessary.” A meta tag cannot really help in XHTML when served with an XML content type, since the encoding has already been decided (by defaulting to UTF-8 or otherwise) when the tag is seen.

For HTML syntax, the <meta charset=...> tag may be used, but it is not needed for validity, and the encoding can be specified in HTTP headers (which override any meta tags). Using a meta tag may however be helpful, since a page might be saved locally, and then there won’t be any HTTP headers available when it is opened.

like image 91
Jukka K. Korpela Avatar answered Oct 14 '22 14:10

Jukka K. Korpela