Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP preg_split utf8 characters

Have problem with preg split and utf. This is code:

$original['words'] = preg_split("/[\s]+/", $original['text']);
print_r($original);

This is answer:

Array
(

    [text] => Šios baterijos kaista
    [words] => Array
        (
            [0] => �
            [1] => ios
            [2] => baterijos
            [3] => kaista

This code is runing in CakePHP framework. Make a notice that [text] is showed correctly before words and is messed in split progress. By the way, I tried using these one:

mb_internal_encoding( 'UTF-8'); 
mb_regex_encoding( 'UTF-8');  
ini_set('default_charset','utf-8');

None helped. Thank you.

like image 975
Paulius Avatar asked Feb 28 '13 14:02

Paulius


1 Answers

You need to enable utf-8 mode for preg_split by adding the u modifier to the regular expression:

preg_split("/[\s]+/u", $original['text']);

The configuration directives you mention as part of trying to find a solution play no role here.

like image 156
Jon Avatar answered Nov 17 '22 17:11

Jon