Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why my page cannot display é, instead, showing �?

I have set page encoding to UTF-8 in HTML:

meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"

and in HTTP header, I have:

Content-Type    text/html; charset=UTF-8

Why isn't the é shown correctly?


Update:
The data containing the é is crawled from the Internet; the crawler is written in Microsoft .Net. I used MySQL .Net Connector to connect MySQL.

The page to display the é is written in PHP.

like image 399
syking Avatar asked Jun 25 '11 12:06

syking


People also ask

Can UTF-8 handle French characters?

French Characters in HTML Documents - UTF-8 Encoding. This section provides a tutorial example on how enter and use French characters in HTML documents using Unicode UTF-8 encoding. The HTML document should include a meta tag with charset=utf-8 and be stored in UTF-8 format.

Are accented characters UTF-8?

UTF-8 is a standard for representing Unicode numbers in computer files. Symbols with a Unicode number from 0 to 127 are represented exactly the same as in ASCII, using one 8-bit byte. This includes all Latin alphabet letters without accents.

What is UTF-8 data?

UTF-8 is a variable-width Unicode encoding that encodes each valid Unicode code point using one to four 8-bit bytes. UTF-8 has many desirable properties, including that it is backwards compatible with ASCII, often provides a more compact representation of Unicode data than UTF-16, and is endianness independent.


3 Answers

make sure your file does not have a BOM (byte order mark) at its beginning. i had this problem recently, and even though the file was saved as utf8 (checked several times), the BOM confused firefox and it wrongly displayed umlauts (i had html <meta> tags set to the correct encoding and http headers)

like image 34
knittl Avatar answered Oct 02 '22 03:10

knittl


You need to add much more information, but a is usually a sign for a ISO-8859-1 character in data that is treated as UTF-8.

It comes either from

  • The source file claiming to be UTF-8, but actually being saved as ISO-8859-1/Windows-8252 - check your file encoding in your editor or IDE

  • A database connection that uses ISO-8859-1 even though the database tables are UTF-8

like image 51
Pekka Avatar answered Oct 02 '22 05:10

Pekka


The most likely explanation is that the page is not encoded using UTF-8, so when the browser tries to decode the text, it is doing so using the wrong encoding.

You need to make sure that the actual document encoding matches the claimed encoding

like image 44
Quentin Avatar answered Oct 02 '22 05:10

Quentin