I am wanting to know if there is a way, in Oracle SQL, to compare two strings for similarities other than =
or like
. For instance, SQL Server has a function difference(str1, str2)
that compares the two strings and gives a similarity rating (0 to 4). Not exactly what I want but that would still be extremely helpful.
I was hoping specifically for functions or methods that would:
The primary use would be for strings of same length, that contain numbers (IDs, phone numbers, etc.) For my purposes, I would use it to find possible matches in which letters/numbers may have been transposed. Soundex(string)
works well for alpha strings but seems to ignores numbers (for good reason).
I do not have privileges to create functions on my own, but if someone knows a method of doing that I would interested in hearing it. One solution to compare char-to-char (that only works if you know the MAX number of characters) is:
--For char-to-char counts
Decode(substr(ID1,1,1), substr(ID2,1,1), 1, 0) +
Decode(substr(ID1,2,1), substr(ID2,2,1), 1, 0) +
[...]
Decode(substr(ID1,N,1), substr(ID2,N,1), 1, 0)
But that is about as inelegant as you can get.
Any help would be greatly appreciated.
It sounds like you're looking for the UTL_MATCH package
SELECT utl_match.edit_distance( string1, string2 )
FROM dual
tells you the number of edits required to transform string1 into string2
SQL> select utl_match.edit_distance( 'Bear', 'berry' ) from dual;
UTL_MATCH.EDIT_DISTANCE('BEAR','BERRY')
---------------------------------------
3
There are also a couple of similarity functions EDIT_DISTANCE_SIMILARITY
and JARO_WINKLER_SIMILARITY
that give similarity scores between 0 and 100 that give you an idea of how similar the strings are.
There are several ways to solve this in Oracle:
You can create a stored procedure.
You can upload Java code to an Oracle DB and use the functions defined in the code (intro).
You can fetch the strings from the database and compare them in your application.
Pros/cons:
Store procedures are somewhat hard to write and maintain and they might be slow. But they are a standard, often used tool, so unless your company has a strict "no-go" policy, they are usually the "solution" (often like regular expression which solve one problem and create two new ones).
Java code is an extremely powerful tool but I have seen many Oracle installations and so far no one used Java code. I'm not sure what the reasons are, many people just seem to be wary but nothing solid ever materialized. Also note that DB servers are optimized for IO and less so for CPU. So complex Java code might ruin your performance in more ways than you might expect (following the lines of "many are wary")
The last solution always works but depending on what you need, it might just not be an option. On the other hand, I've seen code that performed much better by downloading a lot of the data and performing the complex processing in the app. In one example, the query would take 15 seconds and downloading + command line grep(1)
took 0.3s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With