Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to emulate MySQLs utf8_general_ci collation in PHP string comparisons

Basically, if two strings would evaluate as the same in my database I'd also like to be able to check that at the application level. For example, if somebody enters "bjork" in a search field, I want PHP to be able to match that to the string "Björk" just as MySQL would.

I'm guessing PHP has no direct equivalent to MySQL's collation options, and that the easiest thing to do would be to write a simple function that converts the strings, using strtolower() to make them uniformly lower-case and strstr() to replace multi-byte characters with their corresponding ASCII equivalents.

Is that an accurate assumption? Does anybody have a fool-proof array handy to use as the second parameter of strstr() for conforming strings as various MySQL collations would do (specifically for my current needs, utf8_general_ci)? Or, lacking that, where could I find documentation of exactly how the different collations in MySQL treat various characters? (I saw somewhere that in some collations ß is treated as S and in others as Ss, for instance, but it didn't outline every character evaluation.)

like image 410
Thor Avatar asked Dec 15 '11 02:12

Thor


1 Answers

Here's what I've been using, but I have yet to test it for complete consistency with MySQL.

function collation_conform($string,$collation='utf8_general_ci')
{

    if($collation === 'utf8_general_ci')
    {
        if(!is_string($string))
            return $string;

        $string = strtr($string, array(
            'Š'=>'S', 'š'=>'s', 'Ð'=>'D', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 
            'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 
            'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 
            'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 
            'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 
            'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
            'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'));

        return strtolower($string);
    }
    else die('Unsupported Collation (collation_conform() collation_helper.php)');
}
like image 153
Thor Avatar answered Nov 13 '22 15:11

Thor