Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a column to ASCII on the fly without saving to check for matches with an external ASCII string?

I have a member search function where you can give parts of names and the return should be all members having at least one of username, firstname or lastname matching that input. The problem here is that some names have 'weird' characters like the é in Renée and the user doesn't wanna type the weird character but the normal ASCII substitute e.

In PHP I convert the input string to ASCII using iconv (just in case someone types weird characters). In the database however I should also convert the weird chars to ASCII (obviously) for the strings to match.

I tried the following:

SELECT
  CONVERT(_latin1'Renée' USING ascii) t1, 
  CAST(_latin1'Renée' AS CHAR CHARACTER SET ASCII) t2;

(That's two tries.) Both don't work. Both have Ren?e as output. The question mark should be an e. It's alright if it outputs Ren?ee since I can just remove all question marks after the convert.

As you can imagine, the columns I want to query are encoded Latin1.

Thanks.

like image 387
Rudie Avatar asked Dec 02 '22 04:12

Rudie


2 Answers

You don't need to convert anything. Your requirement is to compare two strings and ask if they're equal, ignoring accents; the database server can use a collation to do that for you:

Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.

mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
|         1 |         1 |         1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
like image 146
Vince Bowdren Avatar answered Dec 29 '22 00:12

Vince Bowdren


First off, it should work this way:

SELECT * FROM `test` WHERE `name` COLLATE utf8_general_ci LIKE '%renee%';

Where the test table is:

+-----+--------+
| id  | name   |
+-----+--------+
|  1  | Renée  |
|  2  | Renêe  |
|  3  | Renee  |
+-----+--------+

What is your MySQL version, and how do you try to match things?


One of the other possible solutions is transliteration.

Related: PHP Transliteration

Transliterating the input should not be a problem, but transliterating the values from the permanent storage (e.g. db) real-time during the search may not be feasible. So you can add three more fields like: username_slug, firstname_slug and lastname_slug. When inserting/modifying a record, set the slug values appropriately. And when searching, search the transliterated input against that slug fields.

+------+----------+---------------+----------+---------------+ ...
| id   | username | username_slug | lastname | lastname_slug | ...
+------+----------+---------------+----------+---------------+ ...
|    1 | Renée    |    renee      | La Niña  | la-nina       | ...
|    2 | Renêe    |    renee      | ...      | ...           | ...
|    3 | Renee    |    renee      | ...      | ...           | ...
+------+----------+---------------+----------+---------------+ ...

A search for "renee" or "renèe" would match all of the records.

As a side effect, you may be able to use that fields for generating SEF (search engine friendly) links, hence they are named ,..._slug, e.g. example.com/users/renee. Of course, in that case you should check for the uniqueness of the slug field.

like image 23
Halil Özgür Avatar answered Dec 28 '22 23:12

Halil Özgür