Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP declare encoding

On the declare() page of the PHP manual:

Encoding

A script's encoding can be specified per-script using the encoding directive.

Example #3 Declaring an encoding for the script.

<?php
declare(encoding='ISO-8859-1');
// code here
?>
  1. What does this do exactly? How is the behaviour of the script affected by this directive?

  2. How does this differ from setting the directives mbstring.internal_encoding (before PHP 5.6) and default_charset (as of PHP 5.6) or using the mb_internal_encoding() function?

(I use both PHP 5.3 and 5.5. Currently my files are saved in UTF-8 and I send the header Content-Type: text/html; charset=utf-8 when serving HTML files.)

like image 582
rink.attendant.6 Avatar asked Dec 03 '14 19:12

rink.attendant.6


2 Answers

PHP 5.6 comes with a new default charset directive set to UTF-8, in some case this may be a problem with pages served in metatag as latin1, you can override this directive by calling ini_set('default_charset', 'iso-8859-1') in your scripts.

For doing that put on each php file you want to be coded to latin1 this piece of code at the beginning of your scripts:

example: index.php

<?php
  $server_root = realpath($_SERVER["DOCUMENT_ROOT"]);
  $config_serv = "$server_root/php/config.php";
  include("$config_serv");
?>

Then create a folder "php" under your root website and put this piece of code into config.php:

example: config.php

<?php
  ##########################################################################
  # Server Directive - Override default_charset utf-8 to latin1 in php.ini #
  ##########################################################################
  @ini_set('default_charset', 'ISO-8859-1');
?>

If your php.ini is set to latin1 (ISO-8859-1) and you want serve a utf-8 (unicode) page you can force encoding using the same way but putting instead of iso-8859-1, utf-8. Look at that:

example: config.php

<?php
  ##########################################################################
  # Server Directive - Override default_charset latin1 to utf-8 in php.ini #
  ##########################################################################
  @ini_set('default_charset', 'UTF-8');
?>

I hope you find my answer useful, I solved my problem in this way!

like image 133
Alessandro Avatar answered Oct 17 '22 04:10

Alessandro


  1. What does this do exactly? How is the behaviour of the script affected by this directive?

From php.ini:

; Allows to set the default encoding for the scripts.  This value will be used
; unless "declare(encoding=...)" directive appears at the top of the script.
; Only affects if zend.multibyte is set.
; Default: ""
;zend.script_encoding =

From php.net:

handled as the file is being compiled....

A script's encoding can be specified per-script using the encoding directive.

In other words if the zend.multibyte directive is set, an optional declare directive at the top of each PHP file can be used to declare each file's character encoding. This means you can have each of your PHP files in different encodings as long as you declare their encodings at the top of each PHP file, and the string literals contained in each of the files will be transparently converted at compile time to the internal_encoding set in php.ini (tested in PHP 7.4.6). The default_charset and internal_encoding configuration options are not changed and your code is unaware of the original encodings since the conversions have taken place at compile time.

  1. How does this differ from setting the directives mbstring.internal_encoding (before PHP 5.6) and default_charset (as of PHP 5.6) or using the mb_internal_encoding() function?

internal_encoding directive (formerly mbstring.internal_encoding)

The declared character encoding at the top of each file is the actual encoding of said file, while the internal_encoding setting in php.ini is the desired character encoding. So if you want your code to see UTF-8 but your PHP files are saved in Windows-1252, you could set your internal_encoding in php.ini to UTF-8 while putting a declare directive at the top of each of your files stating that they are encoded as Windows-1252 and the string literals contained within them will be converted to UTF-8 at compile time. (Tested in PHP 7.4.6)

php.net:

This setting is used for multibyte modules such as mbstring and iconv.

php.ini:

If empty, default_charset is used.

For more information see mb_internal_encoding() function below

mb_internal_encoding function

Setting mb_internal_encoding at run time tells your mb_* functions what multibyte encoding you are using so that calls to functions like mb_strtolower will be able to recognize your multibyte characters so that they can substitute them with their lowercase equivalents. If you don't set this at runtime it will assume the encoding set in the internal_encoding directive in php.ini.

The mb_internal_encoding function executes at runtime and therefore can't be used to tell PHP what each PHP file's declared encoding should be converted to at compile time. (See above.)

From PHP.net:

[Set/Get] the character encoding name used for the HTTP input character encoding conversion, HTTP output character encoding conversion, and the default character encoding for string functions defined by the mbstring module. You should notice that the internal encoding is totally different from the one for multibyte regex.

default_charset directive

Setting the default_charset directive tells PHP what value to use in the content-type HTTP response header. For example content-type: text/html; charset=UTF-8

This directive also tells PHP what character encoding to look for in certain functions such as htmlspecialchars and htmlentities. For example if your default_charset is UTF-8 but your database is set to use latin1 then htmlspecialchars will have trouble with non-ascii characters if Windows-1252 is not specified as the encoding because Windows-1252 contains byte sequences that are considered invalid in UTF-8. It's also used as the internal_encoding if the internal_encoding is not explicitly set.

From php.net

default_charset string

In PHP 5.6 onwards, "UTF-8" is the default value and its value is used as the default character encoding for htmlentities(), html_entity_decode() and htmlspecialchars() if the encoding parameter is omitted. The value of default_charset will also be used to set the default character set for iconv functions if the iconv.input_encoding, iconv.output_encoding and iconv.internal_encoding configuration options are unset, and for mbstring functions if the mbstring.http_input mbstring.http_output mbstring.internal_encoding configuration option is unset.

All versions of PHP will use this value as the charset within the default Content-Type header sent by PHP if the header isn't overridden by a call to header().

Setting default_charset to an empty value is not recommended.

like image 40
PHP Guru Avatar answered Oct 17 '22 04:10

PHP Guru