I was trying range();
function with non-English language. It is not working.
$i =0
foreach(range('क', 'म') as $ab) {
++$i;
$alphabets[$ab] = $i;
}
Output: à =1
It was Hindi (India) alphabets. It is only iterating only once (Output shows).
For this, I am not getting what to do!
So, if possible, please tell me what to do for this and what should I do first before thinking of working with non-English text with any PHP functions.
Short answer: it's not possible to use range
like that.
You are passing the string 'क' as the start of the range and 'म' as the end. You are getting only one character back, and that character is à
.
You are getting back à
because your source file is encoded (saved) in UTF-8. One can tell this by the fact that à
is code point U+00E0
, while 0xE0
is also the first byte of the UTF-8 encoded form of 'क' (which is 0xE0 0xA4 0x95
). Sadly, PHP has no notion of encodings so it just takes the first byte it sees in the string and uses that as the "start" character.
You are getting back only à
because the UTF-8 encoded form of 'म' also starts with 0xE0
(so PHP also thinks that the "end character" is 0xE0
or à
).
You can write range
as a for
loop yourself, as long as there is some function that returns the Unicode code point of an UTF-8 character (and one that does the reverse). So I googled and found these here:
// Returns the UTF-8 character with code point $intval
function unichr($intval) {
return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}
// Returns the code point for a UTF-8 character
function uniord($u) {
$k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
$k1 = ord(substr($k, 0, 1));
$k2 = ord(substr($k, 1, 1));
return $k2 * 256 + $k1;
}
With the above, you can now write:
for($char = uniord('क'); $char <= uniord('म'); ++$char) {
$alphabet[] = unichr($char);
}
print_r($alphabet);
See it in action.
The lazy solution would be to use html_entity_decode()
and range()
only for the numeric ranges it was originally intended (that it works with ASCII is a bit silly anyway):
foreach (range(0x0915, 0x092E) as $char) {
$char = html_entity_decode("&#$char;", ENT_COMPAT, "UTF-8");
$alphabets[$char] = ++$i;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With