Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should user agents handle unrecognized HTML elements?

I have tried to find an answer to this in the W3C HTML specifications, but haven't had any luck so far.

For example, if I have the following HTML code:

<body>
    <p>
        <foo>bar</foo>
    </p>
</body>

Does W3C specify how a user agent should handle this? E.g should the "foo" element be completely ignored? Should the "foo" element be ignored but the content "bar" parsed?

Also, is it even "legal" to do this?

Edit: Some excellent answers from all of you! I totally agree that it would be bad practice to embed generic XML unless, possibly, if you have complete control over which browser your users will use. I was mostly curious about what actually would or should happen if such markup were to be produced :-)

like image 303
Christian Palmstierna Avatar asked Sep 19 '11 12:09

Christian Palmstierna


2 Answers

The HTML spec doesn't say much about it, other than:

The HTMLUnknownElement interface must be used for HTML elements that are not defined by this specification (or other applicable specifications).

This can be verified in conforming browsers using the following JavaScript code in the console:

Object.prototype.toString.call(document.createElement("foo"));
//-> "[object HTMLUnknownElement]"

However, some browsers either don't follow the specification here yet. For instance, Chrome 13 gives [object HTMLElement], IE 8 gives [object HTMLGenericElement] (IE 9 is correct).

As far as I'm aware, all browsers will parse <foo> as an element, but default styling and behaviour is not guaranteed to be the same. Where HTMLUnknownElement is implemented and the spec is followed, it should inherit directly from HTMLElement and, therefore, have many of the default properties found on other elements.

Please note that your HTML will not validate when you have non-standard elements in your markup. It's also worth mentioning that search engine crawlers, screen readers and other software will not be able to extract semantic meaning from these elements.

Further reading:

  • Why generic XML on the web is a bad idea and 386: Generic Elements; Still a Bad Idea - Anne van Kesteren's blog (2005, 2010)
like image 113
Andy E Avatar answered Sep 28 '22 06:09

Andy E


Some excellent advice from @Andy E. This is just some add-ons to that.

The HTML5 draft does define how to parse unknown elements, however, it is distinctly non-trivial. To see the rules, see http://dev.w3.org/html5/spec/tree-construction.html

Note that the first version of Firefox to use these rules is FireFox 4, and the first version of IE to use the rules is IE 10. Older versions have a number of different and often very strange behaviours.

HTML has no notion of "legality", only validity or conformance to a standard. You are free to decide whether you want your pages to conform to any particular standard or not. There is no W3C standard of HTML where use of arbitrarily named elements is conforming.

It is generally advisable to make your HTML conforming to avoid unpredictable errors in browsers and other HTML consumers that you haven't tested against.

like image 20
Alohci Avatar answered Sep 28 '22 04:09

Alohci