Does MySql full text search works reasonably with non-Latin languages? (Hebrew, Arabic, Japanese...)
Addition: Did some tests... It has some problems with Hebrew. Example: The name מוסינזון is pronounced the same as מושינזון but searching one won't find the other, as this is a common spelling error in Hebrew, it seems I will have to do some data manipulation for it to work perfectly.
MySQL has support for full-text indexing and searching: A full-text index in MySQL is an index of type FULLTEXT . Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR , VARCHAR , or TEXT columns.
Full-Text Search in MySQL server lets users run full-text queries against character-based data in MySQL tables. You must create a full-text index on the table before you run full-text queries on a table. The full-text index can include one or more character-based columns in the table.
To perform a case-sensitive full-text search, use a case-sensitive or binary collation for the indexed columns. For example, a column that uses the utf8mb4 character set of can be assigned a collation of utf8mb4_0900_as_cs or utf8mb4_bin to make it case-sensitive for full-text searches.
ALTER TABLE table_name ADD FULLTEXT(column_name1, column_name2,…) In this syntax, First, specify the name of the table that you want to create the index after the ALTER TABLE keywords. Second, use the ADD FULLTEXT clause to define the FULLTEXT index for one or more columns of the table.
Though Hebrew support in MySQL is limited, your problem is more a problem of people using incorrect spelling, then a dysfunction of the MySQL server in this perspective. When you misspell a word in Google, it will show you a suggestion and you can click on that suggestion to search for that term.
Perhaps you could build some program that has the same behaviour, e.g. you could create a table that has 2 fields, one containing the commonly misspelled word and the other containing the correct spelling. You could then build a program that finds the misspelled word and displays the suggestion.
So long as your collation is set properly, it works splendidly.
Unicode will work for most of this, of course. But that doesn't really translate Latin characters to them very well (for example, in a Dutch collation aa
will be recognized as å
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With