Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accent-insensitive sorting in MySQL

I am trying to achieve accent and case-insensitive sorting in MySQL. Following the instructions in the manual, this is supposed to work with the utf8 character set and utf8_general_ci collation.

When I follow the example in the manual (http://dev.mysql.com/doc/refman/5.1/en/charset-collation-implementations.html) under "Collations for Unicode multi-byte character sets" I do not get the same results:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 679877
Server version: 5.1.41-log MySQL Community Server (GPL) by Remi

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
|         1 |         0 |         0 |
+-----------+-----------+-----------+
1 row in set (0.00 sec)

mysql> 

In the example in the manual, those are all 1.

It also fails to treat accented characters equally when I try to set the collation directly in a query. In this example, the table is using latin1 and I'm converting to utf8.

mysql> select * from test;
+----------+
| k        |
+----------+
| Cárdenas |
| Cardozo  |
| Corbin   |
| Cabrero  |
+----------+

mysql> select k from test order by convert(k using utf8) collate utf8_general_ci
;
+----------+
| k        |
+----------+
| Cabrero  |
| Cardozo  |
| Corbin   |
| Cárdenas |
+----------+
4 rows in set (0.00 sec)

It should be ignoring the accent over the 'a' in the last entry and sorting it second. Any ideas what I'm doing wrong?

like image 252
Lee Avatar asked Mar 05 '10 18:03

Lee


People also ask

Is MySQL select case sensitive?

Table names are stored in lowercase on disk and name comparisons are not case-sensitive. MySQL converts all table names to lowercase on storage and lookup. This behavior also applies to database names and table aliases.

How do I sort numbers in MySQL?

The ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.

What is MySQL collation?

A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.


1 Answers

It works on my default MySQL installation. Since you haven't provided a SHOW FULL COLUMNS from test (@fsb comment) it's still possible that the collation in your table structure is incorrect.

  • Is the the collation of column 'k' set to something other than utf8_general_ci?

  • Check whether SELECT k from Names ORDER BY k ASC is giving the right answer

  • Has the MySQL installation Index.xml file been modified to change the meaning of utf8_general_ci?

Relevant part of my installation for comparison:

<charset name="utf8">
  <family>Unicode</family> 
  <description>UTF-8 Unicode</description> 
  <alias>utf-8</alias> 
  <collation name="utf8_general_ci" id="33">
    <flag>primary</flag> 
    <flag>compiled</flag> 
  </collation>
  <collation name="utf8_bin" id="83">
    <flag>binary</flag> 
    <flag>compiled</flag> 
  </collation>
 </charset>
  • Has the compiled code been modified to change the meaning of utf8_general_ci?

Conceivably someone else has tweaked one of these things for some nefarious purpose...

like image 156
MZB Avatar answered Sep 30 '22 01:09

MZB