Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I control UTF-8 ordering in MySQL?

For example, this is the order that a standard 'SORT BY name' query is returning:

name
------
Cebbb
Čebbc
Cebbd

I'd like to SORT BY name and get the accented character first, grouped with other accented characters, e.g.

name
------
Čebbc
Cebbb
Cebbd

By default, MySql treats Č as if it were C, for sorting purposes, and sorts in the order shown.

Alternatively, is there a way, in PHP, that I can 'convert' Č to C, for comparison purposes?

like image 240
Bobby Jack Avatar asked May 04 '12 10:05

Bobby Jack


2 Answers

You can add a COLLATE expression to your ORDER BY clause:

SELECT k
FROM t1
ORDER BY k COLLATE utf8_spanish_ci;

Then, the column will be sorted using the correct collate, and the column definition can stay as it is.

like image 191
Yogu Avatar answered Sep 18 '22 23:09

Yogu


The simplest way is to apply a proper collation to the column itself, e.g.:

CREATE TABLE foo (
    foo_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` VARCHAR(30) NOT NULL COLLATE 'utf8_spanish_ci',
    PRIMARY KEY (`foo_id`)
)
COLLATE='utf8_spanish_ci'
ENGINE=InnoDB;

Update:

Č isn't a Spanish letter:

In Croatian, Slovenian, Bosnian, Skolt Sami, and Lakota alphabets, it is the fourth letter of the alphabet. In Czech, Northern Sámi alphabet and the Baltic languages Lithuanian and Latvian, the letter is in fifth place. In Slovak it is sixth letter of the alphabet. It is also used in Pashto (equivalent to چ‎) and Saanich.

Unlike numbers, letters don't have a fixed absolute order. The whole concept of alphabetic order is relative—you have to pick a rule set first. Run this SQL query to know which ones are available in your MySQL server:

SHOW CHARACTER SET

... and chose the one that better fits your expectations. There's a brief description in the manual.

like image 36
Álvaro González Avatar answered Sep 20 '22 23:09

Álvaro González