I am working with a large number of HTML files that are mostly encoded as utf-8. There are accented characters galore as many are in French. I have been converting them to HTML entities as I go, but I noticed that even in IE5.5 (according IE tester) the nonconverted accented characters are displaying properly.
Should I be concerned with character display and convert them all to HTML entities just to be on the safe side?
Characters that are not present on your keyboard can also be replaced by entities. Some characters are reserved in HTML. If you use the less than (<) or greater than (>) signs in your text, the browser might mix them with tags. Character entities are used to display reserved characters in HTML.
Answer: You can include accented characters and ligatures in JavaScript strings and/or display them on your HTML pages using the following encodings for the letters: hexadecimal codes xXX in JavaScript strings; e.g. ñ is xF1 Unicode hex codes uXXXX in JavaScript strings; e.g. š is u0161 HTML entities; for example, ñ is ñ and š is š
We can use the HTML entities for displaying characters such as ‘<’ or ‘>’ which may otherwise be interpreted as code while displaying. For displaying specific characters such as ‘<’ or ‘>’, we use HTML entities. These characters may otherwise be interpreted as code while displaying on the browser.
HTML entities begin with an ampersand (&) and end with a semicolon (;). They are written as- HTML reserves some characters, and it is possible that the browser may misinterpret them as mark-up elements. There are also some characters that are not present on a standard keyboard but are desired on a web-page.
If the files are UTF-8 encoded, you should set the Content-Type
header to be text/html; charset=UTF-8
and have an equivalent meta tag on the page:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This gives the browser all the information for displaying UTF-8 characters correctly. There is no need to encode accented characters.
There is normally no reason to use entities for characters like accented letters. Using them is valid but tends to obfuscate the source code and may therefore cause errors.
However, in some cases the entities are needed. The reasons are not related to browsers but to the authoring side. In particular, if you need to edit the files using an editor or an authoring program that does not handle accented letters well, you may find entities useful. The same applies if the data has to pass through some software that has similar problems. And in some cases, you need to work within an environment where you have no control over HTTP headers and the headers specify an encoding that does not let you enter all characters directly.
The thing you need to remember is French is part of the UTF-8 family along with Portuguese, Spanish, etc, so they will display properly with a UTF-8 tag in place and providing the browser is also using UTF-8 for the page.
The problem is when a person using a browser that is forcing another charset comes to the page, this will break the un-encoded characters. This happens a bit here in Brazil where many browsers are not set for automatic detection of the charset and are set to ISO-8859-1 that is common here.
So where possible encode all of your "special" characters for the most universal access possible.
I hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With