Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get almost matching string from Oracle table?

I have a table in Oracle with four columns. Table Data in Oracle.

Now user can enter input string as "operation Knee right" (which is valid) to my query and my query should return the ICD Code (IKR123) which matches most of the word in DiagnosisName column.

Following is my current query.(Not giving the proper output)

SELECT diagnosisname
FROM
  (SELECT diagnosisname,
    UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
  FROM icd_code
  ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
  )
WHERE ROWNUM<2;

This query giving me the output as "Left Knee Operation" but my expectation is "Right Knee Operation".

like image 819
shary.sharath Avatar asked Apr 29 '15 05:04

shary.sharath


1 Answers

There are few things to note about your usage of UTL_MATCH:

  • EDIT_DISTANCE_SIMILARITY : Returns an integer between 0 and 100, where 0 indicates no similarity at all and 100 indicates a perfect match.
  • JARO_WINKLER_SIMILARITY : Returns an integer between 0 and 100, where 0 indicates no similarity at all and 100 indicates a perfect match but tries to take into account possible data entry errors.

ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC

This is not going to give you correct result. Since, you are only considering the possible similarity, however, not considering the data entry errors. So, you must use JARO_WINKLER_SIMILARITY.

operation Knee right

You need to keep in mind the CASE of the input and the column values to compare. They must be in similar case for a correct match. You are passing the input in LOWERCASE, however, your column values are in INITCAP. Better convert both the column values and the input to a similar case.

Let's look at the below demonstration to understand:

SQL> WITH DATA AS(
  2  SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
  3  SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
  4  SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
  5  SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
  6  SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
  7  )
  8  SELECT t.*,
  9    utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
 10    UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
 11  FROM DATA t
 12  ORDER BY jws DESC
 13  /

DIAGNOSIS_NAME       ICD_CO        EDS        JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123         20         72
Knee Operation       IK123          20         70
Heart Operation      IH123          25         68
Left Knee Operation  IKL123         25         64
Fever                IF123          15         47

SQL>

So, you see how both are different from each other. jaro_winkler_similarity does a better job in identifying the data entry errors and giving the most close match. Based on that, simply choose the first row after sorting it in descending order:

SQL> WITH DATA AS(
  2  SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
  3  SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
  4  SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
  5  SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
  6  SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
  7  )
  8  SELECT diagnosis_name
  9  FROM
 10    (SELECT t.*,
 11      utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
 12      UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
 13    FROM DATA t
 14    ORDER BY jws DESC
 15    )
 16  WHERE rownum = 1
 17  /

DIAGNOSIS_NAME
--------------------
Right Knee Operation

SQL>
like image 61
Lalit Kumar B Avatar answered Oct 02 '22 14:10

Lalit Kumar B