Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 encoded html pages show � (questions marks) instead of characters

I have the standard XAMPP installation on win7 (x64). Having had my share of encoding troubles in a past project where mysql encoding did not match with the php enconding which in turn sometimes output html in other encodings, I decided to consistently encode everything using utf-8.

I'm just getting started with the html markup and am allready experiencing troubles.

  • My page is saved using utf-8 (no BOM, I think)
    //update: It turns out this was NOT the case. The file was actually saved with ISO_8859-1. I later found this out thanks to Sherm Pendleys answer. I had to go back and change my project settings (which were set to "ISO-8859-1") to the desired "UTF-8".
  • php is set per .htaccess to serve .php-pages in utf-8 with: AddCharset UTF-8 .php
  • html has a meta tag specifying: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  • To test I set used php header('Content-Type:text/html; charset=UTF-8');

The page is evidently served in utf-8 (firefox and chrome recognize it as such) but any special characters such as é, á or ¡ will just show as . Also when viewing the source code.

When dropping the encoding settings mentioned above all characters are rendered correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

How come? I'm very puzzled. I would have expected the exact opposite behavior.
Any advice is welcome, thanks!

edit: Hopefully this helps a bit more. This is the response header (as per firebug)

HTTP/1.1 200 OK Date: Sat, 26 Mar 2011 20:49:44 GMT Server: Apache/2.2.14 (Win32) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l mod_autoindex_color PHP/5.3.1 mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1 X-Powered-By: PHP/5.3.1 Content-Length: 91 Keep-Alive: timeout=5, max=99 Connection: Keep-Alive Content-Type: text/html; charset=utf-8 
like image 353
leugim Avatar asked Mar 26 '11 20:03

leugim


People also ask

How do I change my UTF-8 character set?

Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.

What does UTF-8 do in HTML?

The HTML5 Standard: Unicode UTF-8 Unicode enables processing, storage, and transport of text independent of platform and language. The default character encoding in HTML-5 is UTF-8.

How do I set character encoding in HTML?

From ASCII to UTF-8 ASCII defined 128 different characters that could be used on the internet: numbers (0-9), English letters (A-Z), and some special characters like ! $ + - ( ) @ < > . ISO-8859-1 was the default character set for HTML 4. This character set supported 256 different character codes.


1 Answers

When [dropping] the encoding settings mentioned above all characters [are rendered] correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

Then that's what you're really sending. None of the encoding settings in your bullet list will actually modify your output in any way; all they do is tell the browser what encoding to assume when interpreting what you send. That's why you're getting those �s - you're telling the browser that what you're sending is UTF-8, but it's really ISO-8859-1.

like image 116
Sherm Pendley Avatar answered Oct 10 '22 22:10

Sherm Pendley