Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL DISTINCT and accents

A MySQL database running on Debian (version 5.5.41-0+wheezy1-log).

A table hotels with a column name VARCHAR(128) and engine is InnoDB.

CREATE TABLE `hotels` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT
  `name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Hotel Name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;

There are two records in this table:

1    BEST WESTERN PREMIER LE CARRE FOLIES OPERA
2    BEST WESTERN PREMIER LE CARRÉ FOLIES OPÉRA

When executing select DISTINCT name FROM hotels, the query is returning only 1 record, while 2 records were expected to be returned.

The DBMS doesn't seem to distinct between E and É.

How to change the table settings in order to get the expected result?

like image 863
Stefaan Neyts Avatar asked Apr 10 '15 21:04

Stefaan Neyts


1 Answers

The table collation was set to utf8_general_ci. This was the default setting of the MySQL server and the schema.

There are 3 collation names available in MySQL 5.5:

  • A name ending in _ci indicates a case-insensitive collation.
  • A name ending in _cs indicates a case-sensitive collation.
  • A name ending in _bin indicates a binary collation. Character comparisons are based on character binary code values.

The collation had to be changed to utf8_bin. This can be done for the table and all columns with this query:

ALTER TABLE hotels CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
like image 166
Stefaan Neyts Avatar answered Sep 29 '22 10:09

Stefaan Neyts