Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does HTML5 change the standard for HTML commenting?

Tags:

comments

html

There is no new standard for comments in HTML5. The only valid comment syntax is still <!-- -->. From section 8.1.6 of W3C HTML5:

Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS (<!--).

The <! syntax originates in SGML DTD markup, which is not part of HTML5. In HTML5, it is reserved for comments, CDATA sections, and the DOCTYPE declaration. Therefore whether this alternative is bad practice depends on whether you consider the use of (or worse, the dependence on) obsolete markup to be bad practice.

Validator.nu calls what you have a "Bogus comment." — which means that it's treated like a comment even though it's not a valid comment. This is presumably for backward compatibility with pre-HTML5, which was SGML-based, and had markup declarations that took the form <!FOO>, so I wouldn't call this new. The reason they're treated like comments is because SGML markup declarations were special declarations not meant to be rendered, but since they are meaningless in HTML5 (with the above exceptions), as far as the HTML5 DOM is concerned they are nothing more than comments.

The following steps within section 8.2.4 lead to this conclusion, which Chrome appears to be following to the letter:

  1. 8.2.4.1 Data state:

    Consume the next input character:

    "<" (U+003C)
    Switch to the tag open state.

  2. 8.2.4.8 Tag open state:

    Consume the next input character:

    "!" (U+0021)
    Switch to the markup declaration open state.

  3. 8.2.4.45 Markup declaration open state:

    If the next two characters are both "-" (U+002D) characters, consume those two characters, create a comment token whose data is the empty string, and switch to the comment start state.

    Otherwise, if the next seven characters are an ASCII case-insensitive match for the word "DOCTYPE", then consume those characters and switch to the DOCTYPE state.

    Otherwise, if there is an adjusted current node and it is not an element in the HTML namespace and the next seven characters are a case-sensitive match for the string "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET character before and after), then consume those characters and switch to the CDATA section state.

    Otherwise, this is a parse error. Switch to the bogus comment state. The next character that is consumed, if any, is the first character that will be in the comment.

    Notice that it says to switch to the comment start state only if the sequence of characters encountered is <!--, otherwise it's a bogus comment. This reflects what is stated in section 8.1.6 above.

  4. 8.2.4.44 Bogus comment state:

    Consume every character up to and including the first ">" (U+003E) character or the end of the file (EOF), whichever comes first. Emit a comment token whose data is the concatenation of all the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character (i.e. up to the character just before the U+003E or EOF character), but with any U+0000 NULL characters replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment was started by the end of the file (EOF), the token is empty. Similarly, the token is empty if it was generated by the string "<!>".)

    In plain English, this turns <!div displayed> into <!--div displayed--> and <!/div> into <!--/div-->, exactly as described in the question.

On a final note, you can probably expect other HTML5-compliant parsers to behave the same as Chrome.


I don't think this is a good habit to take since <! stands for markup declarations like <!DOCTYPE. Thus you think it's commented (well... browser will try to interpret it).

Even if it doesn't appear, this seems not to be the correct syntax for commenting HTML code.