When my PHP script is run with UTF-8 encoding, using non-ASCII characters, some PHP functions like strtolower()
don't work.
I could use mb_strtolower, but this script can be run on all sorts of different platforms and configurations, and the multibyte string extension might not be available. I could check whether the function exists before use, but I have string functions littered throughout my code and would rather not replace every instance.
Someone suggested using set_locale(LC_CTYPE, 'C')
, which he says causes the string functions to work correctly. This sounds fine, but I don't want to introduce that change without understanding exactly what it is doing. I have used set_locale to change the formatting of numbers before, but I have not used the LC_CTYPE
flag before, and I don't really understand what it does. What does the value 'C'
mean?
The setlocale function installs the specified system locale or its portion as the new C locale. The modifications remain in effect and influences the execution of all locale-sensitive C library functions until the next call to setlocale .
The LC_ALL variable sets all locale variables output by the command 'locale -a'. It is a convenient way of specifying a language environment with one variable, without having to specify each LC_* variable. Processes launched in that environment will run in the specified locale.
The setlocale() function is used to set or query the program's current locale. If locale is not NULL, the program's current locale is modified according to the arguments. The argument category determines which parts of the program's current locale should be modified.
To enable UTF-8 mode, use ". UTF8" as the code page when using setlocale . For example, setlocale(LC_ALL, ". UTF8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
C
means "use whatever locale is hard coded" (and since most *NIX programs are written in C, it's called C
). However, it is usually not an UTF-8 locale.
If you are using multibyte charsets such as UTF-8 you cannot use the regular string functions - using the mb_
counterparts is required. However, almost every PHP installation should have this extension enabled.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With