Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are string functions ASCII-safe in PHP?

Some PHP string functions (like strtoupper, etc) are locale dependent. But it is still not clear whether locale is important when I do really know that particular string is made of ASCII (0-127) characters only. Can I be guaranteed that strtoupper('abc..xyz') will always return ABC..XYZ independently of locale. Do PHP string functions work the same in ASCII range independently of locale?

While the answer about strtoupper is important to me, the question is more general about all string functions library.

I want to be sure that user selected locale (on a multi-language site) will not break my core functionality which has nothing to do with internationalization.

like image 925
Karolis Avatar asked May 31 '11 00:05

Karolis


2 Answers

Do PHP string functions work the same in ASCII range independent from locale?

No, I'm afraid not. The primary counterexample is the dreaded Turkish dotted-I:

setlocale(LC_CTYPE, "tr_TR");
echo strtoupper('hi!');

-> 'H\xDD!' ('Hİ!' in ISO-8859-9)

In the worst case you may have to provide your own locale-independent string handling. Calling setlocale to revert to C or some other locale is kind-of a fix, but the POSIX process-level locale model is a really bad fit for modern client/server apps.

like image 147
bobince Avatar answered Oct 03 '22 21:10

bobince


PHP string functions treat one byte as one character. In the ASCII range 0-127 that is fine.

To safely handle multiple languages using UTF-8, use mb_*() functions, a UTF-8 library or wait til 2030 when PHP6 is released.

like image 43
alex Avatar answered Oct 03 '22 20:10

alex