Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What characters are allowed in DOM IDs? [duplicate]

Tags:

html

dom

Actually there is a difference between HTML and XHTML. As XHTML is XML the rules for XML IDs apply:

Values of type ID MUST match the Name production.

NameStartChar ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                          [#xD8-#xF6] | [#xF8-#x2FF] |
                          [#x370-#x37D] | [#x37F-#x1FFF] |
                          [#x200C-#x200D] | [#x2070-#x218F] |
                          [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
                          [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                          [#x10000-#xEFFFF]

NameChar     ::=      NameStartChar | "-" | "." | [0-9] | #xB7 |
                        [#x0300-#x036F] | [#x203F-#x2040]

Source: Extensible Markup Language (XML) 1.0 (Fifth Edition) 2.3

For HTML the following applies:

id = name [CS]
This attribute assigns a name to an element. This name must be unique in a document.

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Source: HTML 4 Specification, Chapter 6, ID Token


The W3C spec Basic HTML data types says "ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".")."


If we take the title of your question literally, then neither the HTML nor XHTML rules apply. Instead, the relevant spec is the DOM one.

Taking DOM Level 3 as our source, and assuming that by "DOM ID" you mean an attribute with the "ID" flag set, then the value is a "DOMString", the characters of which can be any UTF-16 encodable character.

16-bit unit

The base unit of a DOMString. This indicates that indexing on a DOMString occurs in units of 16 bits. This must not be misunderstood to mean that a DOMString can store arbitrary 16-bit units. A DOMString is a character string encoded in UTF-16; this means that the restrictions of UTF-16 as well as the other relevant restrictions on character strings must be maintained. A single character, for example in the form of a numeric character reference, may correspond to one or two 16-bit units.

Of course, this is probably not what you want, and that Ludwig Weinzierl's answer is what you were looking for. However it is wise to understand that not all DOMs can be serialized as HTML or XHTML and that the DOM has it's own set of rules.


According to the HTML 4.0 specs

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").