Generally, you don't need to use HTML entities if your editor supports Unicode. For some instances, entities can be useful: Your editor does not support Unicode. Your keyboard does not support the character you would like to type, such as em-dash or the copyright symbol.
Character entities are used to display reserved characters in HTML.
Definition and Usage The htmlentities() function converts characters to HTML entities. Tip: To convert HTML entities back to characters, use the html_entity_decode() function. Tip: Use the get_html_translation_table() function to return the translation table used by htmlentities().
Based on the comments I have received, I looked into this a little further. It seems that currently the best practice is to forgo using HTML entities and use the actual UTF-8 character instead. The reasons listed are as follows:
As long as your page's encoding is properly set to UTF-8, you should use the actual character instead of an HTML entity. I read several documents about this topic, but the most helpful were:
From the UTF-8: The Secret of Character Encoding article:
Wikipedia is a great case study for an application that originally used ISO-8859-1 but switched to UTF-8 when it became far too cumbersome to support foreign languages. Bots will now actually go through articles and convert character entities to their corresponding real characters for the sake of user-friendliness and searchability.
That article also gives a nice example involving Chinese encoding. Here is the abbreviated example for the sake of laziness:
UTF-8:
這兩個字是甚麼意思
HTML Entities:
這兩個字是甚麼意思
The UTF-8 and HTML entity encodings are both meaningless to me, but at least the UTF-8 encoding is recognizable as a foreign language, and it will render properly in an edit box. The article goes on to say the following about the HTML entity-encoded version:
Extremely inconvenient for those of us who actually know what character entities are, totally unintelligible to poor users who don't! Even the slightly more user-friendly, "intelligible" character entities like θ will leave users who are uninterested in learning HTML scratching their heads. On the other hand, if they see θ in an edit box, they'll know that it's a special character, and treat it accordingly, even if they don't know how to write that character themselves.
As others have noted, you still have to use HTML entities for reserved XML characters (ampersand, less-than, greater-than).
You don't generally need to use HTML character entities if your editor supports Unicode. Entities can be useful when:
code is clearer than the corresponding white space character.<
, &
, or "
.Entities may buy you some compatibility with brain-dead clients that don't understand encodings correctly. I don't believe that includes any current browsers, but you never know what other kinds of programs might be hitting you up.
More useful, though, is that HTML entities protect you from your own errors: if you misconfigure something on the server and you end up serving a page with an HTTP header that says it's ISO-8859-1
and a META
tag that says it's UTF-8
, at least your —es will always work.
I would not use UTF-8 for characters that are easily confused visually. For example, it is difficult to distinguish an emdash from a minus, or especially a non-breaking space from a space. For these characters, definitely use entities.
For characters that are easily understood visually (such as the chinese examples above), go ahead and use UTF-8 if you like.
Personally I do everything in utf-8 since a long time, however, in an html page, you always need to convert ampersands (&), greater than (>) and lesser then (<) characters to their equivalent entities, &, > and <
Also, if you intend on doing some programming using utf-8 text, there are a few thing to watch for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With