I have about 2000 rows in a mysql database.
Each row is a max of 300 characters and contains a sentence or two.
I use mysql's built in fulltext search to search these rows.
I would like to add a feature so that typos and accidental mispellings are corrected, if possible.
For example, if someone types "right shlder" into the searchbox, this would equate to "right shoulder" when performing the search.
What are your suggestions on the simplest way to add this kind of functionality? Is it worth adding an external search engine of some kind, like lucene? (It seems like for such a small dataset, this is overkill.) Or is there a simpler way?
Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall. Most full text search implementations use an "inverted index".
To perform a case-sensitive full-text search, use a case-sensitive or binary collation for the indexed columns. For example, a column that uses the utf8mb4 character set of can be assigned a collation of utf8mb4_0900_as_cs or utf8mb4_bin to make it case-sensitive for full-text searches.
MySQL has support for full-text indexing and searching: A full-text index in MySQL is an index of type FULLTEXT . Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR , VARCHAR , or TEXT columns.
The basic query format of full-text searches in MySQL should be similar to the following: SELECT * FROM table WHERE MATCH(column) AGAINST(“string” IN NATURAL LANGUAGE MODE); When MATCH() is used together with a WHERE clause, the rows are automatically sorted by the highest relevance first.
I think you should use SOUNDS LIKE
or SOUNDEX()
As your data set is so small, one solution may be to create a new table to store the individual words or soundex values contained in each text field and use SOUNDS LIKE on that table.
e.g:
SELECT * FROM table where id IN ( SELECT refid FROM tableofwords WHERE column SOUNDS LIKE 'right' OR column SOUNDS LIKE 'shlder' )
see: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html
I belive it is not possible to wild card seach the string :(
MySQL doesn't support SOUNDEX search in fulltext.
If you want to implemente a lucene like framework, it means that you have to take all the documents, splits them into words, and then builds an index for each word.
When someone search for "right shlder" you have to make a SOUNDEX search for each words in the worlds table:
$search = 'right shlder'; preg_match_all('(\w+)', $search, $matches); if (!empty($matches[0])) $sounds = array_map('soundex', $matches[0]); $query = 'SELECT word FROM words_list WHERE SOUNDEX(word) IN(\''.join('\',\'',$sounds).'\')';
and then make a fulltext search:
$query2 = 'SELECT * FROM table WHERE MATCH(fultextcolumn) AGAINST ('.join (' OR ', $resuls).' IN BINARY MODE)';
Where $result is an array with the results of the first query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With