Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is IE failing to show UTF-8 encoded text?

I have a some Chinese characters that I'm trying to display on a Kentico-powered website. This text is copy/pasted into Kenticos FCK editor, and is then saved and appears on the site. In Firefox, Chrome, and Safari, the characters appear exactly as expected. In IE 8 Standards mode, I see only boxes.

The text is UTF-8 encoded, and as far as I can tell, it is encoded correctly in the response from the server. There is a Content-Type: text/html; charset=utf-8 response header, and a <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> meta tag on the page too. When I download the HTML from the server and compare the bytes of the characters in question to the original UTF-8 text document, the bytes all match, except the HTML does not include a BOM.

This seems to be specific to IE 8 in Standards mode. In IE 8 Quriks: it works. IE 7 Standards: it works. IE 7 Quirks: Works. I'm not sure how standards mode would cause this problem.

Strangely, if I view-source from IE, the characters show up in the source view correctly.

Any suggestions on what might be wrong here? Am I missing something obvious?

like image 614
mrdrbob Avatar asked Aug 13 '10 17:08

mrdrbob


People also ask

What is the difference between UTF-8 and Windows 1252 encoding?

Windows-1252 is a subset of UTF-8 in terms of 'what characters are available', but not in terms of their byte-by-byte representation. Windows-1252 has characters between bytes 127 and 255 that UTF-8 has a different encoding for. Any visible character in the ASCII range (127 and below) are encoded 1:1 in UTF-8.

What is UTF-8 encoded text?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

What characters are not allowed in UTF-8?

0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits.


2 Answers

I can't explain this in detail. But this is indeed a known problem.

Here's a small reproducible code snippet:

<!DOCTYPE html> <html lang="en">     <head><title>test</title></head>     <body><p>&#65185;<br>0 0</p></body> </html> 

Save it in UTF-8 and view in IE8. You see nothing. Replace 0 0 by 00 and reload the page. It'll work fine! This is absolutely astonishing. Weirdly, replacing 0 0 by a a or the <br> by a </p><p> will fix it as well. It'll have something to do with failures in whitespace rendering.

Sorry, I don't have authorative resources proving this, but this is just another evidence IE8 isn't as good as we expect it is. Your best bet is to try to change the HTML and/or build it step by step so that it works at some point or when in vain, add the following meta tag to the head to force IE8 into IE7 mode:

<meta http-equiv="X-UA-Compatible" content="IE=7" /> 
like image 164
BalusC Avatar answered Sep 25 '22 13:09

BalusC


The default IE encoding is Western European (ISO) so you need to change it manually to UTF-8 or enforce IE to use a given encoding like this:

  • HTML 4.01

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  • HTML 5

    <meta charset="UTF-8">

And you also need to use lang attribute in <html> tag to declare language

    <html lang="zh"> 

for Chinese

like image 40
Cassian Avatar answered Sep 25 '22 13:09

Cassian