Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why aren't browsers strict about HTML? [closed]

It's a well known fact that browsers will accept invalid HTML and do their best trying to make sense out of it. If you create a web page containing only the following code:

<html>
    <head>
        <title>This is bad HTML</title>
    <body>
        <h1>Bad HTML</h2>
        <p>This is a paragraph
    </body>

then you will get a webpage parsed in a way that will show an acceptable view. Whether it is what you meant or not, depends on each browser's understanding of your mistakes.

This, to me, is the same as if Javascript could be written like this:

if (some_var == 1) {
    say_something("some text');
else {
    do_something_else();
// END OF CODE

which, a Javascript compiler written with the same effort to make sense out of invalid code could proably parse as you meant - or make its own sense but run it after all.

I've seen several articles and questions regarding the question "Is it even worth it writting valid HTML?", which present several opinions on the pros and cons of writting valid HTML. However, what this really makes me wonder is:

Why are browsers accepting invalid HTML in the first place?

NOTE: The following questions are not more questions, but a way to give context to the only question I'm asking here:

  • Why aren't browsers strict?

  • Why don't they reject with errors invalid code, just like any other programming language? (not that I'm calling HTML a programming language, but you get the point)

  • Wouldn't that force all developers to write HTML code that will be interpreted exactly the same in any browser?

  • If browsers refused to parse invalid markup, wouldn't that effectively result in valid markup everywhere and from anyone wanting to publish content in the web?

  • If this comes from historical reasons and backward compatibility, isn't it time already to change when we already see sites like adsense.google.com refusing compatibility with IE < v10?

EDIT: Those voting to close this question, please reconsider. This is not a broad question neither is a opinion based one. It's a very specific question on a very specific subject, completely related to the programming world and that can definitely be answered with a real answer by those who actually know it. Thanks.

like image 842
Francisco Zarabozo Avatar asked Aug 29 '14 00:08

Francisco Zarabozo


People also ask

Why do browsers tolerate errors in HTML?

"Why are browsers accepting invalid HTML in the first place?" For compatibility reasons, and in the case of newer browsers, because HTML5 dictates an algorithm for parsing even invalid documents.

Is HTML supported by all browsers?

HTML5 is now compatible with all popular browsers (Chrome, Firefox, Safari, IE9, and Opera) and with the introduction of DOCTYPE, it is even possible to have a few HTML features in older versions of Internet Explorer too.

Which browser does not support HTML tag?

All browsers except Netscape 4 will allow empty tags.

Why is XHTML preferred over HTML?

XHTML was developed to make HTML more extensible and flexible to work with other data formats (such as XML). In addition, browsers ignore errors in HTML pages, and try to display the website even if it has some errors in the markup. So XHTML comes with a much stricter error handling.


1 Answers

"Why are browsers accepting invalid HTML in the first place?"

For compatibility reasons, and in the case of newer browsers, because HTML5 dictates an algorithm for parsing even invalid documents.

Earlier HTML specifications were ambiguous on many situations, such as what happens when the wrong tag is seen, or inconsistent nesting of tags, such as <b><i></b></i>. Even so, many documents "just work" because some earlier browsers ignore unexpected tags or even "correct" incorrect nesting.

But now the HTML5 specification includes a much less ambiguous algorithm for parsing HTML documents. Note that the algorithm includes points where "parse errors" can occur. But these parse errors usually don't stop a modern browser from displaying an HTML document, although the browser is free to display parse errors in its developer tools if it chooses to:

[U]ser agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification. [Emphasis added.]

But again, no modern browser, to my knowledge, aborts parsing a document this early because of parse errors (barring extraordinary situations, such as running out of memory).

On the adsense.google.com situation: This probably has nothing to do with invalid HTML, but rather, perhaps, because IE9 and earlier's DOM support is not sufficient for adsense.google.com's needs.

like image 141
Peter O. Avatar answered Oct 20 '22 19:10

Peter O.