Length of strings in unicode are different

Question

How come the length of the following strings is different although the number of characters in the strings are the same

echo strlen("馐 馑 馒 馓 馔 馕 首 馗 馘")."<BR>";
echo strlen("Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ")."<BR>";

Outputs

35
26

Niet the Dark Absol · Accepted Answer

The first batch of characters take up three bytes each, because they're way down in the 39-thousand-ish character list, whereas the second group only take two bytes each, being around 400. (The number of bytes/octets required per character are discussed in the UTF-8 wikipedia article.)

strlen counts the number of bytes taken by the string, which gives such odd results in Unicode.

Yahia · Answer

I am no PHP expert but it seems that strlen it counts bytes... there is mb_strlen which counts characters...

EDIT - for further reference on how multi-byte encoding works see http://en.wikipedia.org/wiki/Variable-width_encoding and esp. UTF8 see http://en.wikipedia.org/wiki/UTF-8 and

Length of strings in unicode are different

Tags:

php

unicode

Imran Omar Bukhsh

2 Answers

Niet the Dark Absol

Yahia

Recent Activity

Donate For Us

Length of strings in unicode are different

Tags:

php

unicode

Imran Omar Bukhsh

2 Answers

Niet the Dark Absol

Yahia

Related questions

Recent Activity

Donate For Us