Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Select strings which have equal words

Suppose I have a table of strings, like this:

VAL
-----------------
Content of values
Values identity
Triple combo
my combo
sub-zero combo

I want to find strings which have equal words. The result set should be like

VAL                MATCHING_VAL
------------------ ------------------
Content of values  Values identity
Triple combo       My combo
Triple combo       sub-zero combo

or at least something like this. Can you help?

like image 332
Mike Avatar asked Dec 15 '15 22:12

Mike


3 Answers

One method is to use a hack for regular expressions:

select t1.val, t2.val
from t t1 join
     t t2
     on regexp_like(t1.val, replace(t2.val, ' ', '|');

You might want the case to be identical as well:

     on regexp_like(lower(t1.val), replace(lower(t2.val), ' ', '|');
like image 172
Gordon Linoff Avatar answered Sep 28 '22 03:09

Gordon Linoff


You could use a combination of SUBSTRING and LIKE.

use charIndex(" ") to split the words up in the substring if thats what you want to do.

like image 44
Anton Avatar answered Sep 28 '22 04:09

Anton


Using some of the [oracle internal similiarity] found in UTL_Match (https://docs.oracle.com/database/121/ARPLS/u_match.htm#ARPLS71219) matching...

This logic is more for matching names or descriptions that are 'Similar' and where phonetic spellings or typo's may cause the records not to match.

By adjusting the .5 below you can see how the %'s get you closer and closer to perfect matches.

with cte as (
select 'Content of values' val from dual union all
select 'Values identity' val from dual union all
select 'triple combo' from dual union all
select 'my combo'from dual union all
select 'sub-zero combo'from dual)

select a.*, b.*, utl_match.edit_distance_similarity(a.val, b.val) c, UTL_MATCH.JARO_WINKLER(a.val,b.val) JW 
from cte a
cross join cte b
where UTL_MATCH.JARO_WINKLER(a.val,b.val) > .5
order by utl_match.edit_distance_similarity(a.val, b.val) desc

and screenshot of query/output.

Or we could use an inner join and > if we only want one way compairisons...

select a.*, b.*, utl_match.edit_distance_similarity(a.val, b.val) c, UTL_MATCH.JARO_WINKLER(a.val,b.val) JW 
from cte a
inner join cte b
  on A.Val > B.Val
where utl_match.jaro_winkler(a.val,b.val) > .5
order by utl_match.edit_distance_similarity(a.val, b.val) desc

this returns the 3 desired records.

But this does not explicitly check each any word matches. which was your base requirement. I just wanted you to be aware of alternatives.

enter image description here

like image 25
xQbert Avatar answered Sep 28 '22 03:09

xQbert