Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using str_split on a UTF-8 encoded string

Tags:

string

php

utf-8

I'm currently working on a project, and instead of using regular MySQL queries I thought I'd go ahead and learn how to use PDO.

I have a table called contestants, both the database, the table, and all of the columns are in utf-8. I have ten entries in the contestant table, and their column "name" contains characters such as åäö.

Now, when I fetch an entry from the database, and var_dump the name, I get a good result, a string with all the special characters intact. But what I need to do is to split the string by characters, to get them in an array that I then shuffle.

For instance, I have this string: Test ÅÄÖ Tåän

And when I run str_split I get each character in it's own key in an array. The only issue is that all the special characters display as this: �, meaning the array will be like this:

Array (     [0] => T     [1] => e     [2] => s     [3] => t     [4] =>       [5] => �     [6] => �     [7] => �     [8] => �     [9] => �     [10] => �     [11] =>       [12] => T     [13] => �     [14] => �     [15] => �     [16] => �     [17] => n ) 

As you can see, it not only messes up the characters, but it also duplicates them in str_split process. I've tried several ways to split the string, but they all have the same issue. When I output the string before the split, it shows the special characters just fine.

This is my dbConn.php code:

// Require config file: require_once('config.inc.php');

// Start PDO connection: $dbHandle = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf-8", $dbUser, $dbPass); $dbHandle -> exec("SET CHARACTER SET utf8");  // Set error reporting: $dbHandle->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING); 

And this is the code that I use to fetch from the database and loop:

// Require files: require_once('dbConn.php');  // Get random artist: $artist = $dbHandle->query("SELECT * FROM ".ARTIST_TABLE." WHERE id = 11 ORDER BY RAND() LIMIT 1"); $artist->setFetchMode(PDO::FETCH_OBJ); $artist = $artist->fetch(); var_dump($artist->name);  // Split name: $artistChars = str_split($artist->name); 

I'm connecting with utf-8, my php file is utf-8 without BOM and no other special characters on this page share this issue. What could be wrong, or what am I doing wrong?

like image 244
Jonathan Avatar asked Oct 19 '11 13:10

Jonathan


People also ask

What is a UTF encoded string?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

Is a valid UTF-8 character?

UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8. All other characters use two to four bytes.


2 Answers

Mind that the utf8 declaration used in your connect-string is reported to be not working. In the comments on php.net I frequently see this alternative:

$dbHandle = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf8", $dbUser, $dbPass,                     array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'")); 
like image 51
Leo Avatar answered Oct 14 '22 10:10

Leo


str_split does not work with multi-byte characters, it will only return the first byte - thus invalidating your characters. you could use mb_split.

like image 45
Wesley van Opdorp Avatar answered Oct 14 '22 10:10

Wesley van Opdorp