Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

change default character set of PHP functions like "htmlspecialchars"

I am using PHP 5.2.6 and my app's character set is UTF-8.

Now, how should I change PHP's default character set? NOT the one which specifies output's mime time and character set.

But which will change for all the PHP function like htmlspecialchars, htmlentities, etc.

I know, there is a parameter in those functions which takes the character set of the input string. But I don't want to specify for all the functions I use. And if somewhere I forget, it will be mess.

I also know, that I can wrap those functions and create my own wrapper like:

function myHtmlize($str)
{
  return htmlspecialchars($str, ENT_COMPAT, 'UTF-8');
}

I also, don't like this solution.

I really want to tell PHP, that by default take 'UTF-8' as the character set. Not 'iso-8859-1'.

Is it possible?

like image 932
Sabya Avatar asked Jul 24 '09 07:07

Sabya


People also ask

What's the difference between HTML entities () and htmlspecialchars ()?

Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.

Why Htmlspecialchars is used in PHP?

Description. The htmlspecialchars() function is used to converts special characters ( e.g. & (ampersand), " (double quote), ' (single quote), < (less than), > (greater than)) to HTML entities ( i.e. & (ampersand) becomes &amp, ' (single quote) becomes &#039, < (less than) becomes &lt; (greater than) becomes &gt; ).

What is the use of Htmlspecialchars () function?

The htmlspecialchars() function converts some predefined characters to HTML entities.

What is utf8 PHP?

Definition and Usage. The utf8_encode() function encodes an ISO-8859-1 string to UTF-8. Unicode is a universal standard, and has been developed to describe all possible characters of all languages plus a lot of symbols with one unique number for each character/symbol.


2 Answers

Like this one ? http://us2.php.net/manual/en/function.setlocale.php

* LC_ALL for all of the below
* LC_COLLATE for string comparison, see strcoll()
* LC_CTYPE for character classification and conversion, for example strtoupper()
* LC_MONETARY for localeconv()
* LC_NUMERIC for decimal separator (See also localeconv())
* LC_TIME for date and time formatting with strftime()
* LC_MESSAGES for system responses (available if PHP was compiled with libintl)
like image 50
Ahmet Kakıcı Avatar answered Sep 23 '22 03:09

Ahmet Kakıcı


There is a C-function determine_charset(char *charset_hint ...) which is used to find the "right" charset based on

  • what has been passed as charset_hint
  • the setting of mb_internal_encoding
  • default_charset
  • compile-time CODESET
  • last but not least: LC_CTYPE locale

in that order and depending on whether some extensions are built-in or not.
The "problem" is, when you call htmlentities('xyz') this determine_charset() is called with charset_hint=NULL and the first this function does is:

/* Guarantee default behaviour for backwards compatibility */
if (charset_hint == NULL)
    return cs_8859_1;

You have to call at least htmlentities('xyz', ENT_QUOTES, '')

like image 31
VolkerK Avatar answered Sep 26 '22 03:09

VolkerK