How come the length of the following strings is different although the number of characters in the strings are the same
echo strlen("馐 馑 馒 馓 馔 馕 首 馗 馘")."<BR>";
echo strlen("Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ")."<BR>";
Outputs
35
26
The first batch of characters take up three bytes each, because they're way down in the 39-thousand-ish character list, whereas the second group only take two bytes each, being around 400. (The number of bytes/octets required per character are discussed in the UTF-8 wikipedia article.)
strlen counts the number of bytes taken by the string, which gives such odd results in Unicode.
I am no PHP expert but it seems that strlen
it counts bytes... there is mb_strlen
which counts characters...
EDIT - for further reference on how multi-byte encoding works see http://en.wikipedia.org/wiki/Variable-width_encoding and esp. UTF8 see http://en.wikipedia.org/wiki/UTF-8 and
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With