if i have two strings in mysql:
@a="Welcome to Stack Overflow" @b=" Hello to stack overflow";
is there a way to get the similarity percentage between those two string using MYSQL? here for example 3 words are similar and thus the similarity should be something like:
count(similar words between @a and @b) / (count(@a)+count(@b) - count(intersection))  
 and thus the result is 3/(4 + 4 - 3)= 0.6
 any idea is highly appreciated!
Hamming Distance, named after the American mathematician, is the simplest algorithm for calculating string similarity. It checks the similarity by comparing the changes in the number of positions between the two strings.
STRCMP() function in MySQL is used to compare two strings. If both of the strings are same then it returns 0, if the first argument is smaller than the second according to the defined order it returns -1 and it returns 1 when the second one is smaller the first one.
you can use this function (cop^H^H^Hadapted from http://www.artfulsoftware.com/infotree/queries.php#552):
CREATE FUNCTION `levenshtein`( s1 text, s2 text) RETURNS int(11)     DETERMINISTIC BEGIN      DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;      DECLARE s1_char CHAR;      DECLARE cv0, cv1 text;      SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;      IF s1 = s2 THEN        RETURN 0;      ELSEIF s1_len = 0 THEN        RETURN s2_len;      ELSEIF s2_len = 0 THEN        RETURN s1_len;      ELSE        WHILE j <= s2_len DO          SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;        END WHILE;        WHILE i <= s1_len DO          SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;          WHILE j <= s2_len DO            SET c = c + 1;            IF s1_char = SUBSTRING(s2, j, 1) THEN               SET cost = 0; ELSE SET cost = 1;            END IF;            SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;            IF c > c_temp THEN SET c = c_temp; END IF;              SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;              IF c > c_temp THEN                 SET c = c_temp;               END IF;              SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;          END WHILE;          SET cv1 = cv0, i = i + 1;        END WHILE;      END IF;      RETURN c;    END   and for getting it as XX% use this function
CREATE FUNCTION `levenshtein_ratio`( s1 text, s2 text ) RETURNS int(11)     DETERMINISTIC BEGIN      DECLARE s1_len, s2_len, max_len INT;      SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);      IF s1_len > s2_len THEN         SET max_len = s1_len;       ELSE         SET max_len = s2_len;       END IF;      RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100);    END 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With