I recently installed PHP 5.4 on my Ubuntu 12.10 from apt-get.
PHP Info shows: PHP Version 5.4.6-1ubuntu1
I just installed all common packages, like mysql, pgsql, curl, etc, didn't make any other changes but I have a problem.
I like using the ISO-8859-1/latin1 encoding in my files and databases, because it was where I got the best workflow. Now I have a problem with this because PHP does not seem to get along with exceptions whose messages encoded that way.
Well, just for clarify it better, I created a test file like this:
ini_set('display_errors', 1);
error_reporting(E_ALL);
throw new Exception('é');
If the code above is in a utf-8 file, it's all ok, with Xdegub enabled I get:
( ! ) Fatal error: Uncaught exception 'Exception' with message 'é' in /home/henrique/public/teste.php on line 5
( ! ) Exception: é in /home/henrique/public/teste.php on line 5
Call Stack
# Time Memory Function Location
1 0.0002 124212 {main}( ) ../teste.php:0
If the file is in ISO-8859-1, if Xdebug is enabled, the problem is just the message not being displayed:
( ! ) Fatal error: in /home/henrique/public/teste.php on line 5
( ! ) Exception: in /home/henrique/public/teste.php on line 5
Call Stack
# Time Memory Function Location
1 0.0002 124436 {main}( ) ../teste.php:0
However, without Xdebug, all I get is this "very clarifying" message:
Fatal error: in /home/henrique/public/teste.php on line 5
Maybe it's a problem within Apache, because when I try the same using the command line, I get:
Stack trace:
#0 {main}
thrown in /home/henrique/public/teste.php on line 5
Fatal error: Uncaught exception 'Exception' with message '�' in /home/henrique/public/teste.php on line 5
Exception: � in /home/henrique/public/teste.php on line 5
Call Stack:
0.0002 121256 1. {main}() /home/henrique/public/teste.php:0
The message is still there, however, it's illegible, but is there...
I also tried with Lighttpd 1.4.28 and the results were the same.
Tried with PHP 5.4 built-in server and got this on my terminal:
[Wed Jun 5 21:32:08 2013] PHP Fatal error: Uncaught exception 'Exception' with message '�' in /var/www/test2.php:9
Stack trace:
#0 {main}
thrown in /var/www/test2.php on line 9
[Wed Jun 5 21:32:08 2013] 127.0.0.1:55116 [200]: /test2.php - Uncaught exception 'Exception' with message '�' in /var/www/test2.php:9
Stack trace:
#0 {main}
thrown in /var/www/test2.php on line 9
But in the browser, still the same problem.
Have you tried this in a different server?
I think is your configuration, I created a test file on my server, you can view it here http://cai.tlacaelelrl.com/tests/test.php
the contents are
ini_set('display_errors', 1);
error_reporting(E_ALL);
print 'Character encoding is: '.mb_internal_encoding();
throw new Exception('é');
The character set is applied to the file, I also added the character set to the htaccess file.
I am not sure if it is because of xdebug but I could not do a test with it enabled.
Can you try adding this
AddCharset ISO-8859-1 .php
To your .htaccess file
The exception message in PHP is a string, like no news to you.
Strings in PHP are binary. This effectively means that PHP does not care at all about the encoding therein, strings in PHP just preserve any encoding that can be expressed with binary data in octets (that is that 8 bits form a single byte which then is one character in a PHP string if you use substring access like $string[10]
to access the 11th character).
As all those things ensure that however you write the message, however it will be passed into the output.
So the only difference is how you display the output. Let's say you've got the Latin-1 encoding in that exception message string and you output it via your apache server and then you view it in your browser and your browser (we don't care about the reason so far) displays it as UTF-8 you will see that question-mark-diagmond/crystal: �.
Same applies to the terminal if the terminal displays it as UTF-8.
Or if you save the output into a file and then you open that file in your editor as being UTF-8 encoded.
So how to fix that? For your browser, please look into the documentation of your browser how you can tell your browser in which encoding the website you're currently looking at should be displayed. Every browser I know of has some kind of menu where you can specify it. The charset you use is commmon, so even older browsers have that.
Same applies to the terminal. You can set the locale of the shell as well as the encoding for the terminal. Consult the documentation of the shell you're using.
For the textfile, I bet you now already know how to deal with it: Checkout which options your editor provide.
A final note of caution: If you want to properly analyze what your server returns to a request containing the exception message output, you need to use the developer tools of your browser to make the server's response headers visible. You will likely see a change to your previous configuration that is (in error) saying that the content is UTF-8 encoded while the encoding is latin-1. Fix that error if you don't want to change the encoding in the browser manually. To do that, consult the PHP documentation and the documentation of your webserver.
[email protected] came up with an explanation:
https://bugs.php.net/bug.php?id=63426&edit=2
The reason it's cannot be fixed is complex is simple. Since 5.4 the PHP's internal encoding is UTF-8, where it was latin1 before. Everything else has almost no change.
Every error message to show in HTML context needs to have the entities converted. For that the same functionality as in htmlspecialchars() is used. Where before PHP 5.4 it was forced to use latin1, now it's forced to use UTF8. There is per design. Using header() with content-type or default_charset affects merely only the senging of the content-type header.
Thus, you use error text in latin1, but UTF-8 will be used to convert entities, and that will die at the first invalid char. The relevant place in the code: http://lxr.php.net/xref/PHP_5_4/main/main.c#1083 , subsequently determine_charset() will deliver UTF8 for the conversion charset. That's the reason why your accent char is swallowed. And that's the reason why Hui couldn't reproduce this - if you look at his post earlier, indeed latin1 is sent in content-type, but obviously an UTF-8 encoded PHP script used, so the error message is "Fatal error: Uncaught exception 'Exception' with message 'é' in ...". The current condition however doesn't enforce you to have scripts in UTF-8, in your script encoded in latin you still could throw the exception using utf8_encode('é'). The reason it works with CLI is because no HTML entities have to be encoded, so the chars are passed as is to the output.
This all actually means this issue was always there, but it was in favour of users with default iso-8859-1. Now users with default UTF-8 do profit. Looking through the codes to solving this might require more global intrusion than required just by this ticket.
For htmlspecialchars() behaviour change see also bug #61354
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With