PHP String Function with non-English languages

Question

I was trying range(); function with non-English language. It is not working.

$i =0
foreach(range('क', 'म') as $ab) {

    ++$i;

    $alphabets[$ab] = $i;

}

Output: à =1

It was Hindi (India) alphabets. It is only iterating only once (Output shows).

For this, I am not getting what to do!

So, if possible, please tell me what to do for this and what should I do first before thinking of working with non-English text with any PHP functions.

Jon · Accepted Answer

Short answer: it's not possible to use range like that.

Explanation

You are passing the string 'क' as the start of the range and 'म' as the end. You are getting only one character back, and that character is à.

You are getting back à because your source file is encoded (saved) in UTF-8. One can tell this by the fact that à is code point U+00E0, while 0xE0 is also the first byte of the UTF-8 encoded form of 'क' (which is 0xE0 0xA4 0x95). Sadly, PHP has no notion of encodings so it just takes the first byte it sees in the string and uses that as the "start" character.

You are getting back only à because the UTF-8 encoded form of 'म' also starts with 0xE0 (so PHP also thinks that the "end character" is 0xE0 or à).

Solution

You can write range as a for loop yourself, as long as there is some function that returns the Unicode code point of an UTF-8 character (and one that does the reverse). So I googled and found these here:

// Returns the UTF-8 character with code point $intval
function unichr($intval) {
    return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}

// Returns the code point for a UTF-8 character
function uniord($u) {
    $k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
    $k1 = ord(substr($k, 0, 1));
    $k2 = ord(substr($k, 1, 1));
    return $k2 * 256 + $k1;
}

With the above, you can now write:

for($char = uniord('क'); $char <= uniord('म'); ++$char) {
    $alphabet[] = unichr($char);
}

print_r($alphabet);

See it in action.

mario · Answer

The lazy solution would be to use html_entity_decode() and range() only for the numeric ranges it was originally intended (that it works with ASCII is a bit silly anyway):

foreach (range(0x0915, 0x092E) as $char) {

    $char = html_entity_decode("&#$char;", ENT_COMPAT, "UTF-8");
    $alphabets[$char] = ++$i;
}

PHP String Function with non-English languages

Tags:

php

utf-8

Satya Prakash

2 Answers

Explanation

Solution

Jon

mario

Recent Activity

Donate For Us

PHP String Function with non-English languages

Tags:

php

utf-8

Satya Prakash

2 Answers

Explanation

Solution

Jon

mario

Related questions

Recent Activity

Donate For Us