I have a member search function where you can give parts of names and the return should be all members having at least one of username, firstname or lastname matching that input. The problem here is that some names have 'weird' characters like the é
in Renée
and the user doesn't wanna type the weird character but the normal ASCII substitute e
.
In PHP I convert the input string to ASCII using iconv (just in case someone types weird characters). In the database however I should also convert the weird chars to ASCII (obviously) for the strings to match.
I tried the following:
SELECT
CONVERT(_latin1'Renée' USING ascii) t1,
CAST(_latin1'Renée' AS CHAR CHARACTER SET ASCII) t2;
(That's two tries.) Both don't work. Both have Ren?e
as output. The question mark should be an e
. It's alright if it outputs Ren?ee
since I can just remove all question marks after the convert.
As you can imagine, the columns I want to query are encoded Latin1.
Thanks.
You don't need to convert anything. Your requirement is to compare two strings and ask if they're equal, ignoring accents; the database server can use a collation to do that for you:
Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 1 | 1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
First off, it should work this way:
SELECT * FROM `test` WHERE `name` COLLATE utf8_general_ci LIKE '%renee%';
Where the test
table is:
+-----+--------+
| id | name |
+-----+--------+
| 1 | Renée |
| 2 | Renêe |
| 3 | Renee |
+-----+--------+
What is your MySQL version, and how do you try to match things?
One of the other possible solutions is transliteration.
Related: PHP Transliteration
Transliterating the input should not be a problem, but transliterating the values from the permanent storage (e.g. db) real-time during the search may not be feasible. So you can add three more fields like: username_slug
, firstname_slug
and lastname_slug
. When inserting/modifying a record, set the slug values appropriately. And when searching, search the transliterated input against that slug fields.
+------+----------+---------------+----------+---------------+ ...
| id | username | username_slug | lastname | lastname_slug | ...
+------+----------+---------------+----------+---------------+ ...
| 1 | Renée | renee | La Niña | la-nina | ...
| 2 | Renêe | renee | ... | ... | ...
| 3 | Renee | renee | ... | ... | ...
+------+----------+---------------+----------+---------------+ ...
A search for "renee" or "renèe" would match all of the records.
As a side effect, you may be able to use that fields for generating SEF (search engine friendly) links, hence they are named ,..._slug
, e.g. example.com/users/renee. Of course, in that case you should check for the uniqueness of the slug field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With