Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing strings in PHP the same way MySQL does

I'm storing a varchar in a utf8 MySQL table and using utf8_general_ci collation. I have a unique index on the varchar. I'd like to do a string comparison in PHP that is equivalent to what MySQL will do on the index.

A specific example is that I'd like to be able to detect that 'a' is considered equivalent to 'À' in PHP before this happens:

mysql> insert UniTest (str) values ('a');                                   
Query OK, 1 row affected (0.00 sec)

mysql> insert UniTest (str) values ('À');                                   
ERROR 1062 (23000): Duplicate entry 'À' for key 1
like image 337
twk Avatar asked Jan 22 '09 22:01

twk


3 Answers

The collation has nothing to do with the storage. You need to set the charset to determine the storage encoding. The collation governs how comparison and sorting should happen. The collation must be charset aware, but otherwise it has nothing to do with the charset.

To answer your question, you can use iconv to translitter the text, and then compare it. For example:

function compare($s1, $s2) {
  return strcmp(
    iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $s1),
    iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $s2));
}

This is basically what MySql will do for you, although it's probably faster and it may have a slightly different collation-table than ISO-8859-1//TRANSLIT. Not entirely sure about that.

Would probably be easier to use the database though, as others have already suggested.

like image 141
troelskn Avatar answered Sep 27 '22 22:09

troelskn


Why don’t you just let MySQL decide whether there already is a record with the same key?

You could run a SELECT query to ask if there is already a record with this attribute:

SELECT 1
FROM UniTest
WHERE str = "À"

Or you just give it a try inserting the new record and use the functions mysql_error() and mysql_errno() to see if an error occured.

like image 27
Gumbo Avatar answered Sep 27 '22 20:09

Gumbo


Use intl's Collator or Transliterator.

$s1 = 'a';
$s2 = 'À';

var_dump(
    is_same_string($s1, $s2),
    $s1 === transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', $s2)
);

function is_same_string($str, $str2, $locale = 'en_US')
{
    $coll = collator_create($locale);
    collator_set_strength($coll, Collator::PRIMARY);  
    return 0 === collator_compare($coll, $str, $str2);
}
like image 24
masakielastic Avatar answered Sep 27 '22 20:09

masakielastic