Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What characters must be escaped in HTML 5?

Tags:

HTML 4 states pretty which characters should be escaped:

Four character entity references deserve special mention since they are frequently used to escape special characters:

  • "&lt;" represents the < sign.
  • "&gt;" represents the > sign.
  • "&amp;" represents the & sign.
  • "&quot; represents the " mark.

Authors wishing to put the "<" character in text should use "&lt;" (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter). Similarly, authors should use "&gt;" (ASCII decimal 62) in text instead of ">" to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values.

Authors should use "&amp;" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&amp;" in attribute values since character references are allowed within CDATA attribute values.

Some authors use the character entity reference "&quot;" to encode instances of the double quote mark (") since that character may be used to delimit attribute values.

I'm surprised I can't find anything like this in HTML 5. With the help of grep the only non-XML mention I could find comes as an aside regarding the deprecated XMP element:

Use pre and code instead, and escape "<" and "&" characters as "&lt;" and "&amp;" respectively.

Could somewhat point to the official source on this matter?

like image 905
ezequiel-garzon Avatar asked Sep 01 '14 19:09

ezequiel-garzon


People also ask

What characters have to be escaped regex?

Operators: * , + , ? , | Anchors: ^ , $ Others: . , \ In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped.

What characters should be escaped in JSON?

In JSON the only characters you must escape are \, ", and control codes. Thus in order to escape your structure, you'll need a JSON specific function.

What does escaping characters mean in HTML?

Escaping in HTML means, that you are replacing some special characters with others. In HTML it means usally, you replace e. e.g < or > or " or & . These characters have special meanings in HTML. Imagine, you write <b>hello, world</b> And the text will appear as hello, world.


1 Answers

The specification defines the syntax for normal elements as:

Normal elements can have text, character references, other elements, and comments, but the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand. Some normal elements also have yet more restrictions on what content they are allowed to hold, beyond the restrictions imposed by the content model and those described in this paragraph. Those restrictions are described below.

So you have to escape <, or & when followed by anything that could begin a character reference. The rule on ampersands is the only such rule for quoted attributes, as the matching quotation mark is the only thing that will terminate one. (Obviously, if you don’t want to terminate the attribute value there, escape the quotation mark.)

These rules don’t apply to <script> and <style>; you should avoid putting dynamic content in those. (If you have to include JSON in a <script>, replace < with \x3c, the U+2028 character with \u2028, and U+2029 with \u2029 after JSON serialization.)

like image 142
Ry- Avatar answered Sep 19 '22 06:09

Ry-