Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in HTML Entity length in JavaScript

Why does the entity   have length 6 while the entity ↓ has length 1? Is this in the spec somewhere? (Tested in Firefox, Chrome and Safari.)

JSFiddle

like image 638
Chris Middleton Avatar asked Feb 12 '23 02:02

Chris Middleton


1 Answers

I agree that this is very weird behavior, but at least it's specified.

The HTML fragment serialization algorithm states that:

Escaping a string (for the purposes of the algorithm above) consists of replacing any occurrences of the "&" character by the string "&", any occurrences of the "<" character by the string "<", any occurrences of the ">" character by the string ">", any occurrences of the U+00A0 NO-BREAK SPACE character by the string " ", and, if the algorithm was invoked in the attribute mode, any occurrences of the """ character by the string """.

Emphasis by me. If I had to guess this is to support backwards compatibility in older browsers that did this and to get consistent behavior when deserializing and serializing strings. If the browser serialized the DOM tree result of <div>&nbsp;&nbsp;</div> to <div> </div> deserializing it to the DOM tree again would result in a single space*. This is pretty much the only way the browser can achieve consistent behavior.

The replacement to &darr; on the other hand is completely safe and makes sense.

If you're actually interested in the length of the string stored inside the text using .textContent you'd get the result you were interested in.

* well, not really since it would still be a &nbsp; U+00A0 - but I could get why people think it might be confusing in the early DOM days

like image 70
Benjamin Gruenbaum Avatar answered Feb 16 '23 02:02

Benjamin Gruenbaum