How get each character from a word with special encoding

Question

I need to get an array with all the characters from a word, but the word has letters with special encoding like á, when I execute the follow code:

$word = 'withá';

$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
    $word_arr[] = $word[$i];
}

or

$word_arr = str_split($word);

I get:

array(6) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(1) "Ã" [5]=> string(1) "¡" }

How can I do to obtain each character as follow?

array(5) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(1) "á" }

Tim Withers · Accepted Answer

Because it is a UTF-8 string, just do

$word = 'withá';
$word = utf8_decode($word);
$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
    $word_arr[] = $word[$i];
}

The reason for this is that, even though it looks right in your script, the interpreter converts it into a multibyte character (why mb_split() works as well). To convert it to proper UTF-8 format, you can use the mb functions or just specify utf8_decode().

Aerik · Answer

~~I think mb_split will do it for you: http://www.php.net/manual/en/function.mb-split.php~~

If you're using special encodings, you probably want to read up on how PHP handles multibyte encoding in general...

EDIT: Nope, can't figure out how to make mb_split do it myself, but looking around SO got some other questions that were answered with preg_split. I tested this and it seems to do exactly what you want:

preg_split('//',$word,-1,PREG_SPLIT_NO_EMPTY);

I'd still strongly suggest you read up on multibyte characters in PHP though. It's kind of a mess, IMHO.

Here's some good links: http://www.joelonsoftware.com/articles/Unicode.html and http://akrabat.com/php/utf8-php-and-mysql/ and plenty more can be found...

How get each character from a word with special encoding

Tags:

php

character-encoding

encoding

tokenize

leticia

2 Answers

Tim Withers

Aerik

Recent Activity

Donate For Us

How get each character from a word with special encoding

Tags:

php

character-encoding

encoding

tokenize

leticia

2 Answers

Tim Withers

Aerik

Related questions

Recent Activity

Donate For Us