I need to get an array with all the characters from a word, but the word has letters with special encoding like á, when I execute the follow code:
$word = 'withá';
$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
$word_arr[] = $word[$i];
}
or
$word_arr = str_split($word);
I get:
array(6) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(1) "Ã" [5]=> string(1) "¡" }
How can I do to obtain each character as follow?
array(5) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(1) "á" }
Because it is a UTF-8 string, just do
$word = 'withá';
$word = utf8_decode($word);
$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
$word_arr[] = $word[$i];
}
The reason for this is that, even though it looks right in your script, the interpreter converts it into a multibyte character (why mb_split()
works as well). To convert it to proper UTF-8 format, you can use the mb functions or just specify utf8_decode()
.
I think mb_split will do it for you: http://www.php.net/manual/en/function.mb-split.php
If you're using special encodings, you probably want to read up on how PHP handles multibyte encoding in general...
EDIT: Nope, can't figure out how to make mb_split do it myself, but looking around SO got some other questions that were answered with preg_split. I tested this and it seems to do exactly what you want:
preg_split('//',$word,-1,PREG_SPLIT_NO_EMPTY);
I'd still strongly suggest you read up on multibyte characters in PHP though. It's kind of a mess, IMHO.
Here's some good links: http://www.joelonsoftware.com/articles/Unicode.html and http://akrabat.com/php/utf8-php-and-mysql/ and plenty more can be found...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With