Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should all accented characters use html entities?

I am working with a large number of HTML files that are mostly encoded as utf-8. There are accented characters galore as many are in French. I have been converting them to HTML entities as I go, but I noticed that even in IE5.5 (according IE tester) the nonconverted accented characters are displaying properly.

Should I be concerned with character display and convert them all to HTML entities just to be on the safe side?

like image 743
Damon Avatar asked Mar 06 '12 15:03

Damon


People also ask

What is a character entity in HTML?

Characters that are not present on your keyboard can also be replaced by entities. Some characters are reserved in HTML. If you use the less than (<) or greater than (>) signs in your text, the browser might mix them with tags. Character entities are used to display reserved characters in HTML.

Can I include accented characters and ligatures in JavaScript strings?

Answer: You can include accented characters and ligatures in JavaScript strings and/or display them on your HTML pages using the following encodings for the letters: hexadecimal codes xXX in JavaScript strings; e.g. ñ is xF1 Unicode hex codes uXXXX in JavaScript strings; e.g. š is u0161 HTML entities; for example, ñ is ñ and š is š

Why do we use HTML entities in HTML?

We can use the HTML entities for displaying characters such as ‘<’ or ‘>’ which may otherwise be interpreted as code while displaying. For displaying specific characters such as ‘<’ or ‘>’, we use HTML entities. These characters may otherwise be interpreted as code while displaying on the browser.

What is the format of HTML entities?

HTML entities begin with an ampersand (&) and end with a semicolon (;). They are written as- HTML reserves some characters, and it is possible that the browser may misinterpret them as mark-up elements. There are also some characters that are not present on a standard keyboard but are desired on a web-page.


3 Answers

If the files are UTF-8 encoded, you should set the Content-Type header to be text/html; charset=UTF-8 and have an equivalent meta tag on the page:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

This gives the browser all the information for displaying UTF-8 characters correctly. There is no need to encode accented characters.

like image 51
Oded Avatar answered Oct 25 '22 11:10

Oded


There is normally no reason to use entities for characters like accented letters. Using them is valid but tends to obfuscate the source code and may therefore cause errors.

However, in some cases the entities are needed. The reasons are not related to browsers but to the authoring side. In particular, if you need to edit the files using an editor or an authoring program that does not handle accented letters well, you may find entities useful. The same applies if the data has to pass through some software that has similar problems. And in some cases, you need to work within an environment where you have no control over HTTP headers and the headers specify an encoding that does not let you enter all characters directly.

like image 39
Jukka K. Korpela Avatar answered Oct 25 '22 13:10

Jukka K. Korpela


The thing you need to remember is French is part of the UTF-8 family along with Portuguese, Spanish, etc, so they will display properly with a UTF-8 tag in place and providing the browser is also using UTF-8 for the page.

The problem is when a person using a browser that is forcing another charset comes to the page, this will break the un-encoded characters. This happens a bit here in Brazil where many browsers are not set for automatic detection of the charset and are set to ISO-8859-1 that is common here.

So where possible encode all of your "special" characters for the most universal access possible.

I hope that helps!

like image 44
Ryan Avatar answered Oct 25 '22 12:10

Ryan