SQL query that can find Typos in Arabic language

Tags:

I want to make a dictionary and I need to have a query that can find words with misspelling (I mean Typos / spelling error or typo), if the query could not find exact word then try to get with other spellings...

So what is Arabic: In Arabic there are some letter with (almost) same pronunciation but different letter, people sometimes don't know which one should use. For example there are 4 kind of Z in Arabic "ز / ظ / ذ / ض" pronunciation are different (a little) but people will forgot which one is the correct spelling. an example of one word with different using of "z" letter:

مریز / مریض / مریظ / مریذ

The correct is مریض

Here are other worlds that has more than one latter:

z: ض / ز / ذ / ظ 

T: ت / ط / 

S: ث / س / ص / 

Gh: ق / غ

So what is your idea? How should be the query?

I want if a user searched for "مریز" instead of showing him an error of 404 (not found) search the database with other letter (all Z) then return result if I found anything.

843

asked Jan 09 '16 07:01

kiokoshin

Video Answer

3 Answers

In German, we have the same issue regarding t and tt or dt - especially in names.

One way to approach this would be to store additional normalized column containing the name / word with fixed transformation.

 tt -> t
 dt -> t
 ß  -> s
 ss -> s

So table would contain

 WORD    | NORMALIZED
 schmitt | schmit
 schmidt | schmit

At query time, apply these same transformations to the query and then compare against normalized column.

136

answered Oct 18 '22 23:10

Jan

There is an algorithm called Levenshtein distance (there are others as well), which tells the edit distance between two strings.

You could derive from this, try to find the most resembling words in your dictionary compared to your input.

Later you can assign weight for substitutions based on the letter tuples you mentioned to refine your search.

In fact there is an implementation for MySQL you definitely should check out: https://www.artfulsoftware.com/infotree/qrytip.php?id=552
Most of the levenshtein+mysql questions here in SO point to this page.

answered Oct 18 '22 23:10

Koshinae

A simpler solution would be using regular expressions within a like statement. For letters that are likely to be misspelled you can keep varieties in a regular expression wildcard. For the letters corresponding to z wildcard is "[زذظض]" You can replace all ز، ذ، ظ، ض letters with the wildcard and then query with a like statement:

select * from searched_table where word like "%[مرى[زذظض%"

After you find all versions of the searched word, you may either show the user all of them, or you may calculate the levenshtein distance (koshinae's answer) and show the closest words.

Edit: only for the letter Z, query would be like below

set @word = 'مرىض'; -- take this text from user
set @word = replace(@word, 'ذ', 'Z');
set @word = replace(@word, 'ظ', 'Z');
set @word = replace(@word, 'ض', 'Z');
set @word = replace(@word, 'ز', 'Z');
set @word = replace(@word, 'Z', '[زظضذ]');
set @word = Concat('%',  @word,  '%');
select @word;

select * from mydb.searchTable where word like @word;

answered Oct 18 '22 21:10

Abdullah Nehir

Related questions
                            
                                call to undefined function session_unregister() when trying output
                            
                                H2 and MySQL compatibility issues
                            
                                PHP PDO - Using MySQL Variables
                            
                                MySQL with JPA: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE)
                            
                                Unable to create database using prepared statements in MySql
                            
                                Is comparing strings in MySQL vulnerable to timing attacks?
                            
                                Changing the hybris database to MySQL
                            
                                Magento: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '100000001' for key 'UNQ_SALES_FLAT_ORDER_IN
                            
                                Getting WARN: SQL Error: 1205, SQLState: 41000 ERROR: Lock wait timeout exceeded; try restarting transaction. Saving a record in using hibernate
                            
                                Retrieving specific key-values from a query and fetch count of their pair in query
                            
                                What is the java equivilant to MySQL's smallint [closed]
                            
                                Update table column with another column values of the same table using updateAll()
                            
                                Best way to create nested array from tables: multiple queries/loops VS single query/loop style
                            
                                Query with values prepended by ampersand - works in Oracle but not in MySQL?
                            
                                MySQL order by with condition
                            
                                Yii2 Select data from two tables
                            
                                IF row exists THEN delete row in mysql
                            
                                Issue with executing procedure in spring boot schema.sql file
                            
                                MySQL GROUP BY...HAVING different values same field
                            
                                mysql boolean on where clause

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL query that can find Typos in Arabic language

Tags:

performance

sql

letter

mysql

word