Is there a good rule of thumb for when to use decimal vs. hexadecimal notation for HTML entities?
For example, a non-breaking hyphen is written in decimal as ‑
and in hex as ‑
.
This answer says that hexadecimal is for Unicode; does that mean hex should be used if you're using the <meta charset="utf-8">
tag in the document <head>
?
Occasionally, I will notice entity characters mistakenly rendered instead of the entities they represent -- for example, &
appearing (instead of an ampersand) in an email subject line or RSS headline. Is either hex or decimal better for avoiding this?
One last consideration: can using hex or decimal affect the rendering clarity (crispness) of the character?
There are two systems because hex is more natural from a low level point of view (each of the 3 values are in a one byte range) and decimal is more natural for human developers.
Entities are frequently used to display reserved characters (which would otherwise be interpreted as HTML code), and invisible characters (like non-breaking spaces). You can also use them in place of other characters that are difficult to type with a standard keyboard.
Hex code is one of several ways of labeling colors in CSS and HTML. While named HTML color codes like “aquamarine” and “cadet blue” are convenient, they do not supply the range of possible colors found in hex code. To create a hex code color, it is easiest to start with the first character in each RGB pairing.
The rule of thumb is: use whichever you prefer, but prefer hex. ☺
There is no difference in meaning and no difference in browser support (the last browsers that supported decimal references only died in the 1990s).
As @AlexW describes, hexadecimal references are more natural than decimal, due to the way character code standards are written. But if you find decimal references more convenient, use them.
The issue has nothing to with meta
tags and character encodings. The main reason why character references were introduced into HTML is that they let you enter characters quite independently of the encoding of the document. This includes characters that cannot be directly written at all in the encoding used. Thanks to them, you can enter any Unicode character even if the character encoding is ASCII or some other limited encoding, like ISO-8859-1.
In the old days, it was common to recommend the use of named references (or “entity references” as they are formally called in classic HTML), when possible, because a reference like Ω
, when displayed literally to the user, is more understandable than a reference like Ω
or Ω
. This hasn’t been relevant for over a decade, as far as web browsers are considered. But e.g. e-mail clients might be kind of stupid^H^H^H^H^H^H^H^H^H underdeveloped in this respect. They might e.g. show references as such in a list of messages, even though they can intepret them properly when viewing a message. But there does not seem to be any consistent behavior that you could count on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With