Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

normalizing accented characters in MySQL queries

I'd like to be able to do queries that normalize accented characters, so that for example:

é, è, and ê

are all treated as 'e', in queries using '=' and 'like'. I have a row with username field set to 'rené', and I'd like to be able to match on it with both 'rene' and 'rené'.

I'm attempting to do this with the 'collate' clause in MySQL 5.0.8. I get the following error:

mysql> select * from User where username = 'rené' collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

FWIW, my table was created with:

CREATE TABLE `User` (
  `id` bigint(19) NOT NULL auto_increment,
  `username` varchar(32) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `uniqueUsername` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=56790 DEFAULT CHARSET=utf8
like image 217
George Armhold Avatar asked Feb 20 '10 16:02

George Armhold


People also ask

How do I remove special characters from a MySQL query?

You can remove special characters from a database field using REPLACE() function.

Does MySQL allow special characters?

Alternatively, MySQL also has special character escape sequences as shown below: \0 - An ASCII NUL (0x00) character. \' - A single quote ( ' ) character. \" - A double quote ( " ) character.

Does MySQL support normalization?

Third Normal Form (3NF) in MySQLIn the third normal form, the database is already in the third normal form, if it is in the second normal form. Every non-key column must be mutually independent. Identify any columns in the table that are interdependent and break those columns into their own separate tables.

What is MySQL collate?

A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.


1 Answers

The reason for the error is not the table but the characterset of your input, i.e. the 'rené' in your query. The behaviour depends on the character_set_connection variable:

The character set used for literals that do not have a character set introducer and for number-to-string conversion.

Using the MySQL Client, change it using SET NAMES:

A SET NAMES 'charset_name' statement is equivalent to these three statements:

SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;

(from http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html)

Example output:

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = 'rené' collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = 'rené' collate utf8_general_ci;
Empty set (0.00 sec)

Altenatively, use can explicitly set the character set using a 'character set introducer':

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from User where username = _utf8'rené' collate utf8_general_ci;
Empty set (0.00 sec)

I know this question is pretty old but since Google led me here for a related question, I though it still deserves an answer :)

like image 66
sfussenegger Avatar answered Sep 29 '22 03:09

sfussenegger