Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Practice For Levenshtein Distance on SQL Server

I have a web and a mobile dictionary application that uses SQL Server. I am trying to implement a simple version of "did you mean" feature. If the phrase that user entered is not exists in the db, I need make a suggestions.

I am planning to use the levenshtein distance algorithm. But there is a point that I couldn't figure out: do I need to calculate the levenshtein distance between user entry and all the words that exists in my db one by one?

Let's assume that I have one million word in my database. When user enters an incorrect word, will I calculate distance a million time?

Obviously that would need a great deal of time. What is the best practice for this situation?

like image 446
Umut Derbentoğlu Avatar asked Nov 27 '25 22:11

Umut Derbentoğlu


1 Answers

Have you already looked at the SOUNDEX user defined function that is available in SQL Server ?

You could use a trigger which calculates the soundex of a column and saves it next to that column each time the column is updated. When searching, you can calculate the soundex of the search criterium and compare it with the stored soundex-column in the table.

like image 154
Frederik Gheysels Avatar answered Nov 29 '25 19:11

Frederik Gheysels



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!