I have a table in Oracle with four columns.
Now user can enter input string as "operation Knee right" (which is valid) to my query and my query should return the ICD Code (IKR123) which matches most of the word in DiagnosisName column.
Following is my current query.(Not giving the proper output)
SELECT diagnosisname
FROM
(SELECT diagnosisname,
UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
FROM icd_code
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
)
WHERE ROWNUM<2;
This query giving me the output as "Left Knee Operation" but my expectation is "Right Knee Operation".
There are few things to note about your usage of UTL_MATCH:
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
This is not going to give you correct result. Since, you are only considering the possible similarity, however, not considering the data entry errors. So, you must use JARO_WINKLER_SIMILARITY.
operation Knee right
You need to keep in mind the CASE of the input and the column values to compare. They must be in similar case for a correct match. You are passing the input in LOWERCASE, however, your column values are in INITCAP. Better convert both the column values and the input to a similar case.
Let's look at the below demonstration to understand:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT t.*,
9 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
10 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
11 FROM DATA t
12 ORDER BY jws DESC
13 /
DIAGNOSIS_NAME ICD_CO EDS JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123 20 72
Knee Operation IK123 20 70
Heart Operation IH123 25 68
Left Knee Operation IKL123 25 64
Fever IF123 15 47
SQL>
So, you see how both are different from each other. jaro_winkler_similarity does a better job in identifying the data entry errors and giving the most close match. Based on that, simply choose the first row after sorting it in descending order:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT diagnosis_name
9 FROM
10 (SELECT t.*,
11 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
12 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
13 FROM DATA t
14 ORDER BY jws DESC
15 )
16 WHERE rownum = 1
17 /
DIAGNOSIS_NAME
--------------------
Right Knee Operation
SQL>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With