I am trying to achieve accent and case-insensitive sorting in MySQL. Following the instructions in the manual, this is supposed to work with the utf8 character set and utf8_general_ci collation.
When I follow the example in the manual (http://dev.mysql.com/doc/refman/5.1/en/charset-collation-implementations.html) under "Collations for Unicode multi-byte character sets" I do not get the same results:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 679877
Server version: 5.1.41-log MySQL Community Server (GPL) by Remi
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 0 | 0 |
+-----------+-----------+-----------+
1 row in set (0.00 sec)
mysql>
In the example in the manual, those are all 1.
It also fails to treat accented characters equally when I try to set the collation directly in a query. In this example, the table is using latin1 and I'm converting to utf8.
mysql> select * from test;
+----------+
| k |
+----------+
| Cárdenas |
| Cardozo |
| Corbin |
| Cabrero |
+----------+
mysql> select k from test order by convert(k using utf8) collate utf8_general_ci
;
+----------+
| k |
+----------+
| Cabrero |
| Cardozo |
| Corbin |
| Cárdenas |
+----------+
4 rows in set (0.00 sec)
It should be ignoring the accent over the 'a' in the last entry and sorting it second. Any ideas what I'm doing wrong?
Table names are stored in lowercase on disk and name comparisons are not case-sensitive. MySQL converts all table names to lowercase on storage and lookup. This behavior also applies to database names and table aliases.
The ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.
A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.
It works on my default MySQL installation. Since you haven't provided a SHOW FULL COLUMNS from test (@fsb comment) it's still possible that the collation in your table structure is incorrect.
Is the the collation of column 'k' set to something other than utf8_general_ci?
Check whether SELECT k from Names ORDER BY k ASC is giving the right answer
Has the MySQL installation Index.xml file been modified to change the meaning of utf8_general_ci?
Relevant part of my installation for comparison:
<charset name="utf8">
<family>Unicode</family>
<description>UTF-8 Unicode</description>
<alias>utf-8</alias>
<collation name="utf8_general_ci" id="33">
<flag>primary</flag>
<flag>compiled</flag>
</collation>
<collation name="utf8_bin" id="83">
<flag>binary</flag>
<flag>compiled</flag>
</collation>
</charset>
Conceivably someone else has tweaked one of these things for some nefarious purpose...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With