Why do HTML entity names with dec < 255 not require semicolon?

Q: Are there semicolons in HTML?

In HTML, a semicolon is used to terminate a character entity reference, either named or numeric. The declarations of a style attribute in Cascading Style Sheets (CSS) are separated and terminated with semicolons.

Q: What are the symbol entities Why are they required in HTML?

An HTML entity is a piece of text ("string") that begins with an ampersand ( & ) and ends with a semicolon ( ; ). Entities are frequently used to display reserved characters (which would otherwise be interpreted as HTML code), and invisible characters (like non-breaking spaces).

Q: What is the format for character entity reference?

What is the format for character entity reference? Explanation: The format for character entity reference is &name; name is case-sensitive alphanumeric string and semicolon is necessary.

Tags:

html

html-entities

behavior

In a plain HTML document &pound (dec 163) renders as £ without needing the ;, whereas &oelig (dec 339) will only render a œ with the semicolon. It seems that every html entity with a decimal value under 255 will render without needing the semicolon, both in FireFox and Chrome.

What gives?

518

asked Sep 08 '13 22:09

bryc

2 Answers

The reason is that historically the semicolon has been optional when an entity reference (or a character reference) is not immediately followed by a name character. So &pound? is OK since ? is not a name character (i.e., a character allowed in names), but &pound4 is not, since 4 is a name character, making pound4 the entity name (which is undefined in HTML, but might become defined some day). This rule is part of SGML legacy in HTML, one of the few things where browsers actually applied specialties of SGML.

It has, however, always been regarded as good practice to terminate entity references by a semicolon. XML, and hence XHTML, makes it even formally mandatory.

This is why current browser practices allow omission of semicolons as in “classic” HTML, but only for the limited set of character references denoting ISO Latin 1 characters, i.e. characters with Unicode number less than 256 in decimal (FF in hexadecimal). This was the original set of entity references, and therefore such references have widely been used without semicolon. So the practices are a compromise: they want to encourage into using the recommendable notation but not invalidate a bulk of old pages, still less to make browsers fail to render them properly.

The HTML5 drafts have had various positions on this, but e.g. HTML5 CR from 6 August 2013 requires the semicolon in all cases even in HTML syntax. Lack of semicolon is defined as a parse error, which means that error handling is well-defined (the entity shall be recognized), but browsers may still stop parsing at first parse error!

104

answered Oct 09 '22 07:10

Jukka K. Korpela

Firstly, this is entirely up to how forgiving the browser/rendering engine wants to be, and is not a property of HTML: all entities must end in a semi-colon, or you have invalid syntax. (The WHATWG "HTML Living Standard" confusingly considers this semi-colon to be part of the name, making it seem optional in the Devloper Edition but the full Standard text/W3C HTML5 draft is clearer: "The name must be one that is terminated by a U+003B SEMICOLON character (;).")

Secondly, referring to a character as having a "decimal value" is ambiguous at best. 163 and 339 are the "code points" of those characters in Unicode, which would normally be expressed in hexadecimal. Other encodings would have different positions for those characters, which could also be expressed as a "decimal value" if you wanted.

Thirdly, my guess is that it is not so much to do with where they come in a particular encoding sequence, but how common they are - the full list is extremely long (→WHATWG/→W3C). There is a trade-off to be made in interpreting such invalid sequences, since a URL might contain unescaped ampersands, which then in turn look like unterminated entities (e.g. http://example.com/foo?bar=rab&oelig=gileo). So browsers are trying to tread that fine line and guess which mistake was probably made in a particular case.

answered Oct 09 '22 05:10

IMSoP

Related questions
                            
                                AngularJS - HTML in JS - Escaping single quote
                            
                                Are void elements and empty elements the same?
                            
                                W3C validation says h1 in article is invalid
                            
                                Datalist arrow not coming in ie and firefox
                            
                                bootstrap toggle doesn't work after ajax load
                            
                                Printing Iframe in IE 11 only prints first page
                            
                                It is possible to watch the location in the background on Mobile (iOS / Android)?
                            
                                Escaping forward slash ("/") while using JavaScript to change CSS class
                            
                                Have bootstraps progress bar converted to circle and image inside it
                            
                                How to implement drag and drop in Blazor?
                            
                                React-Bootstrap Rows/Columns are not the full width of the screen (Only 50% of it)
                            
                                Which should generate the HTML: JavaScript or php?
                            
                                File Url Cross Domain Issue in Chrome- Unexpected
                            
                                oninput in IE9 doesn't fire when we hit BACKSPACE / DEL / do CUT
                            
                                Parsing HTML with Python 2.7 - HTMLParser, SGMLParser, or Beautiful Soup?
                            
                                is it possible to style a value of an input text?
                            
                                Using jcrop on responsive images
                            
                                How to change the height of a div dynamically based on another div using css?
                            
                                How to make a local offline database
                            
                                How can I parse ASCII Art to HTML using Java or Javascript? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With