Is there a way to search MySQL database for similar words (mean not identical words) . For example : user searches in database for word "abcd" and there is a word "abd" in the database so the search engine or the program Ask the user "Do you mean [abd] ? " as in most of search engines in the web ? Please notice that the search word is not a part of the existing word (can't use "like")
Select the Object search command: 1 In the Search text field, enter the text that needs to be searched (e.g. ... 2 From the Database drop-down menu, select the database to search in 3 In the Objects drop-down list, select the object types to search in, or leave them all checked More items...
Learn SQL for interviews using SQL Course by GeeksforGeeks. SOUNDEX () Function can find the inconsistency in names. SOUNDEX () can evaluate the similarity of two names. SOUNDEX () only works well when we do have 1 or 2 tokens. Names generally contain 1-2 tokens, so it works fine with Name.
Some of the examples of search that a developer / DBA may make within a database or database server are: searching for a database object, searching for occurance of particular text within database objects, searching within the schema of a database object, search within the results of query results or entire tables, etc..
There are may ways to look for similar strings in a SQL Server column. The most common method is to make use of LIKE operator. Let us see the different ways to look for similar string in a table. Both charindex and patindex look for the match and return its position in the string.
Have a look at the Damerau-Levenshtein distance algorithm. It calculates the "distance" between two strings and determines how many steps it takes to transform one string into another. The less steps the closer the two strings are.
This article shows the algorithm implemented as a MySQL stored function.
The algorithm is so much better than LIKE or SOUNDEX.
I believe Google uses crowd sourced data rather than an algorithm. ie if a user types in abcd, clicks on the back button and then immediately searches for abd then it establishes a relationship between the two search terms as the user wasn't happy with the results. Once you have a very large community searching then the pattern appears.
Since the link in Dave Barker's answer is dead, here is the code from an archived version of the website:
CREATE FUNCTION LEVENSHTEIN (s1 VARCHAR(255), s2 VARCHAR(255)) RETURNS INT DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; DECLARE s1_char CHAR; DECLARE cv0, cv1 VARBINARY(256); SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len DO SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = c_temp; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; IF c > c_temp THEN SET c = c_temp; END IF; SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; END WHILE; SET cv1 = cv0, i = i + 1; END WHILE; END IF; RETURN c; END
To note:
Maximum length of input strings is 255 characters. I’m sure you could edit the function to support more if needed.
I’ve tested it with international characters on a utf8_bin column and it seemed to work, but I’ve not tested that capability exstensively.
I’ve only tested it on MySQL 5.0+. No idea how it will work on versions less than that.
And as a bonus I also created a helper function that returns the ratio (as a percentage) of different : same characters which can be more helpful than just a straight edit distance (idea from here).
CREATE FUNCTION LEVENSHTEIN_RATIO (s1 VARCHAR(255), s2 VARCHAR(255)) RETURNS INT DETERMINISTIC BEGIN DECLARE s1_len, s2_len, max_len INT; SET s1_len = LENGTH(s1), s2_len = LENGTH(s2); IF s1_len > s2_len THEN SET max_len = s1_len; ELSE SET max_len = s2_len; END IF; RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100); END
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With