Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advanced String Comparisons in Oracle SQL

I am wanting to know if there is a way, in Oracle SQL, to compare two strings for similarities other than = or like. For instance, SQL Server has a function difference(str1, str2) that compares the two strings and gives a similarity rating (0 to 4). Not exactly what I want but that would still be extremely helpful.

I was hoping specifically for functions or methods that would:

  1. Compare a string character by character (return the number of exact matches)
  2. Tell how many characters are in one string, but in the wrong place

The primary use would be for strings of same length, that contain numbers (IDs, phone numbers, etc.) For my purposes, I would use it to find possible matches in which letters/numbers may have been transposed. Soundex(string) works well for alpha strings but seems to ignores numbers (for good reason).

I do not have privileges to create functions on my own, but if someone knows a method of doing that I would interested in hearing it. One solution to compare char-to-char (that only works if you know the MAX number of characters) is:

 --For char-to-char counts
    Decode(substr(ID1,1,1),  substr(ID2,1,1), 1,  0) +
    Decode(substr(ID1,2,1),  substr(ID2,2,1), 1,  0) +
    [...]
    Decode(substr(ID1,N,1),  substr(ID2,N,1), 1,  0)

But that is about as inelegant as you can get.

Any help would be greatly appreciated.

like image 482
Phillip Avatar asked Feb 22 '12 16:02

Phillip


2 Answers

It sounds like you're looking for the UTL_MATCH package

SELECT utl_match.edit_distance( string1, string2 )
  FROM dual

tells you the number of edits required to transform string1 into string2

SQL> select utl_match.edit_distance( 'Bear', 'berry' ) from dual;

UTL_MATCH.EDIT_DISTANCE('BEAR','BERRY')
---------------------------------------
                                      3

There are also a couple of similarity functions EDIT_DISTANCE_SIMILARITY and JARO_WINKLER_SIMILARITY that give similarity scores between 0 and 100 that give you an idea of how similar the strings are.

like image 159
Justin Cave Avatar answered Oct 21 '22 16:10

Justin Cave


There are several ways to solve this in Oracle:

  1. You can create a stored procedure.

  2. You can upload Java code to an Oracle DB and use the functions defined in the code (intro).

  3. You can fetch the strings from the database and compare them in your application.

Pros/cons:

Store procedures are somewhat hard to write and maintain and they might be slow. But they are a standard, often used tool, so unless your company has a strict "no-go" policy, they are usually the "solution" (often like regular expression which solve one problem and create two new ones).

Java code is an extremely powerful tool but I have seen many Oracle installations and so far no one used Java code. I'm not sure what the reasons are, many people just seem to be wary but nothing solid ever materialized. Also note that DB servers are optimized for IO and less so for CPU. So complex Java code might ruin your performance in more ways than you might expect (following the lines of "many are wary")

The last solution always works but depending on what you need, it might just not be an option. On the other hand, I've seen code that performed much better by downloading a lot of the data and performing the complex processing in the app. In one example, the query would take 15 seconds and downloading + command line grep(1) took 0.3s.

like image 43
Aaron Digulla Avatar answered Oct 21 '22 18:10

Aaron Digulla