Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize arabic text mysql

I am having trouble searching for arabic text in mysql. I have in database a row containing a record

display_name
أحمد

But when I try to do a query with

SELECT * FROM wp_users WHERE display_name LIKE '%احمد%'

I tried to add at the end of the query

collate utf8_bin

But it didn't work either. How can I have

احمد == أحمد
like image 351
Ahmed Avatar asked Apr 24 '14 14:04

Ahmed


People also ask

How to read and sort Arabic text in MySQL using PHP?

To read ,write and sort Arabic text in mysql database using php correctly, make sure that: mysql_query ("SET NAMES 'utf8'"); mysql_query ('SET CHARACTER SET utf8'); Don't use the deprecated mysql_query; switch to mysqli or PDO.

How to insert Arabic data manually in phpMyAdmin?

To insert Arabic Data manually into your Phpmyadmin. First you check either your database , table and column name is utf8 set or not. If these are not set to utf8 then first you set it then you may insert arabic data into you db table. YOU MAY CHECK EACH OF THESE BY LOOKING BELOW EXAMPLE.

How to insert Arabic data in SQL Server?

First you check either your database , table and column name is utf8 set or not. If these are not set to utf8 then first you set it then you may insert arabic data into you db table. YOU MAY CHECK EACH OF THESE BY LOOKING BELOW EXAMPLE. SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "schemaname";


Video Answer


1 Answers

I don't have an exact solution, but I can tell you why it's not working. If you want those two strings to be considered equal, you need to use a different collation since utf8_bin compares exact code points, and those two strings are clearly not identical when considered that way. Normally MySQL's utf8_general_ci collation would provide transliteration and normalisation, for example all these match:

SELECT 'a'='A' COLLATE utf8_general_ci;
SELECT 'ü'='u' COLLATE utf8_general_ci;
SELECT 'ß'='ss' COLLATE utf8_general_ci;

but in your case it doesn't work, and nor does the more accurate utf8_unicode_ci collation:

SELECT 'احمد'='أحمد' COLLATE utf8_general_ci;
SELECT 'احمد'='أحمد' COLLATE utf8_unicode_ci;

This chart shows the character mapping for middle eastern languages in MySQL's utf8_unicode_ci collation, and you can see that the أ and ا characters are not considered equal, so MySQL's default collations won't solve this problem.

To work around this you have two options: normalise your strings before they hit MySQL (i.e. in PHP), or extend MySQL to provide an appropriate collation to do what you need.

The Ar-PHP project can help with the former, as sємsєм suggested. You should store your real user name and the normalised one separately so that you can search on one and display the other. Another project also provides a way of rephrasing arabic strings to work better in MySQL.

MySQL docs show how to create a custom collation. It essentially involves editing an LDML XML file (there is at least a BBEdit plugin to help with this) and giving it to MySQL. This will allow you to create a mapping that lets you treat some characters as equivalents. The advantage of this approach is that it's transparent to PHP and you don't need any additional columns in your database. If you build such a mapping, it would be beneficial to other Arabic users across multiple programming languages, not just PHP.

like image 132
Synchro Avatar answered Oct 16 '22 09:10

Synchro