I have a question related to SQL.
I want to match two fields for similarities and return a percentage on how similar it is.
For example if I have a field called doc, which contains the following
This is my first assignment in SQL
and in another field I have something like
My first assignment in SQL
I want to know how I can check the similarities between the two and return by how much percent.
I did some research and wanted a second opinion plus I never asked for source code. Ive looked at Soundex(), Difference(), Fuzzy string matching using Levenshtein distance algorithm.
LIKE operator is used for pattern matching, and it can be used as -. % – It matches zero or more characters.
The SQL LIKE OperatorThe LIKE operator is used in a WHERE clause to search for a specified pattern in a column. There are two wildcards often used in conjunction with the LIKE operator: The percent sign (%) represents zero, one, or multiple characters. The underscore sign (_) represents one, single character.
Pattern matching is used to determine whether source files of high-level languages are syntactically correct. It is also used to find and replace a matching pattern in a text or code with another text/code. Any application that supports search functionality uses pattern matching in one way or another.
You didn't say what version of Oracle you are using. This example is based on 11g version.
You can use edit_distance
function of utl_match package to determine how many characters you need to change in order to turn one string to another. greatest function returns the greatest value in the list of passed in parameters. Here is an example:
-- sample of data
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
-- the query
select trunc(((greatest(length(col1), length(col2)) -
(utl_match.edit_distance(col2, col1))) * 100) /
greatest(length(col1), length(col2)), 2) as "%"
from t1
result:
%
----------
70.58
Addendum
As @jonearles correctly pointed out, it is much simpler to use edit_distance_similarity
function of utl_match
package.
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
select utl_match.edit_distance_similarity(col1, col2) as "%"
from t1
;
Result:
%
----------
71
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With