Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to deal with misspellings in a MySQL fulltext search

Tags:

I have about 2000 rows in a mysql database.

Each row is a max of 300 characters and contains a sentence or two.

I use mysql's built in fulltext search to search these rows.

I would like to add a feature so that typos and accidental mispellings are corrected, if possible.

For example, if someone types "right shlder" into the searchbox, this would equate to "right shoulder" when performing the search.

What are your suggestions on the simplest way to add this kind of functionality? Is it worth adding an external search engine of some kind, like lucene? (It seems like for such a small dataset, this is overkill.) Or is there a simpler way?

like image 903
Travis Avatar asked Aug 26 '11 06:08

Travis


People also ask

What is advantage of fulltext over like for performing text search in MySQL?

Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall. Most full text search implementations use an "inverted index".

How do you perform a full-text case sensitive search in MySQL?

To perform a case-sensitive full-text search, use a case-sensitive or binary collation for the indexed columns. For example, a column that uses the utf8mb4 character set of can be assigned a collation of utf8mb4_0900_as_cs or utf8mb4_bin to make it case-sensitive for full-text searches.

Does MySQL support FULL text search?

MySQL has support for full-text indexing and searching: A full-text index in MySQL is an index of type FULLTEXT . Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR , VARCHAR , or TEXT columns.

How FULL text search works MySQL?

The basic query format of full-text searches in MySQL should be similar to the following: SELECT * FROM table WHERE MATCH(column) AGAINST(“string” IN NATURAL LANGUAGE MODE); When MATCH() is used together with a WHERE clause, the rows are automatically sorted by the highest relevance first.


2 Answers

I think you should use SOUNDS LIKE or SOUNDEX()

As your data set is so small, one solution may be to create a new table to store the individual words or soundex values contained in each text field and use SOUNDS LIKE on that table.

e.g:

SELECT * FROM table where id IN  (     SELECT refid FROM tableofwords      WHERE column SOUNDS LIKE 'right' OR column SOUNDS LIKE 'shlder' ) 

see: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

I belive it is not possible to wild card seach the string :(

like image 144
Kevin Burton Avatar answered Oct 13 '22 23:10

Kevin Burton


MySQL doesn't support SOUNDEX search in fulltext.

If you want to implemente a lucene like framework, it means that you have to take all the documents, splits them into words, and then builds an index for each word.

When someone search for "right shlder" you have to make a SOUNDEX search for each words in the worlds table:

    $search = 'right shlder'; preg_match_all('(\w+)', $search, $matches); if (!empty($matches[0]))    $sounds = array_map('soundex', $matches[0]); $query = 'SELECT word FROM words_list     WHERE SOUNDEX(word) IN(\''.join('\',\'',$sounds).'\')'; 

and then make a fulltext search:

$query2 = 'SELECT * FROM table     WHERE MATCH(fultextcolumn)     AGAINST ('.join (' OR ', $resuls).' IN BINARY MODE)'; 

Where $result is an array with the results of the first query.

like image 35
jbrond Avatar answered Oct 14 '22 01:10

jbrond