Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Newlines and special characters in HTML attributes

Tags:

html

My questions are simple:

Is the following valid? If it is, would it break in some browsers?

<div data-text="Blah blah blah
More blah
And just a little extra blah to finish"> ... </div>

Which characters "must" be encoded in attribute values? I know " should be &quot;, but are any others required to be encoded?

like image 449
Niet the Dark Absol Avatar asked Nov 08 '11 07:11

Niet the Dark Absol


People also ask

What are the 4 attributes of HTML?

There are some attributes, such as id , title , class , style , etc. that you can use on the majority of HTML elements.

What is LT and GT in HTML?

&lt; stands for the < sign. Just remember: lt == less than. &gt; stands for the > Just remember: gt == greater than.


2 Answers

Is the following valid?

It's a valid fragment of HTML5, yes.

would it break in some browsers?

Unlikely.

Which characters "must" be encoded in attribute values? I know " should be &quot;, but are any others required to be encoded?

That depends on whether the attribute value is double quoted, single quoted or unquoted.

For the double quoted form " must be replaced by its character reference, and & may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-double-quoted-state

For the single quoted form ' must be replaced by its character reference, and & may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-single-quoted-state

For the unquoted form TAB, LINEFEED, FORMFEED, SPACE, > must be replaced by their character references, and & may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-unquoted-state

like image 153
Alohci Avatar answered Oct 25 '22 20:10

Alohci


HTML 5 spec

There are different requirements for different attributes so there isn't one answer. For instance, title attributes allow lines feeds, but a class attribute is a space seperated line of string tokens.

For data elements though the spec says of the namespace:

contains no characters in the range U+0041 to U+005A (LATIN CAPITAL LETTER A to LATIN CAPITAL LETTER Z).

Other than that, it doesn't make any distinctions.

like image 32
stevebot Avatar answered Oct 25 '22 21:10

stevebot