Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if first five characters of one field match another?

Assuming I have the following table:

AAAAAA
AAAAAB
CCCCCC

How could I craft a query that would let me know that AAAAA and AAAAB are similar (as they share five characters in a row)? Ideally I would like to write this as a query that would check if the two fields shared five characters in a row anywhere in the string but this seems outside the scope of SQL and something I should write into a C# application?

Ideally the query would add another column that displays: Similar to 'AAAAA', 'AAAAB'

like image 780
Michael A Avatar asked Dec 17 '22 04:12

Michael A


2 Answers

I suggest you do not try to violate 1NF by introducing a multi-valued attribute.

Noting that SUBSTRING is highly portable:

WITH T 
     AS
     (
      SELECT * 
        FROM (
              VALUES ('AAAAAA'), 
                     ('AAAAAB'), 
                     ('CCCCCC')
             ) AS T (data_col)
     )
SELECT T1.data_col, 
       T2.data_col AS data_col_similar_to
  FROM T AS T1, T AS T2
 WHERE T1.data_col < T2.data_col
       AND SUBSTRING(T1.data_col, 1, 5) 
              = SUBSTRING(T2.data_col, 1, 5);

Alternativvely:

T1.data_col LIKE SUBSTRING(T2.data_col, 1, 5) + '%';
like image 95
onedaywhen Avatar answered Dec 18 '22 16:12

onedaywhen


This will find all matches, also those in the middle of the word, it will not perform well on a big table

declare @t table(a varchar(20))

insert @t select 'AAAAAA'
insert @t select 'AAAAAB'
insert @t select 'CCCCCC'
insert @t select 'ABCCCCC'
insert @t select 'DDD'

declare @compare smallint = 5

;with cte as
(
select a, left(a, @compare) suba, 1 h
from @t
union all
select a, substring(a, h + 1, @compare), h+1
from cte where cte.h + @compare <= len(a)
)
select t.a, cte.a match from @t t 
-- if you don't want the null matches, remove the 'left' from this join 
left join cte on charindex(suba, t.a) > 0 and t.a <> cte.a  
group by t.a, cte.a

Result:

a                    match
-------------------- ------
AAAAAA               AAAAAB
AAAAAB               AAAAAA
ABCCCCC              CCCCCC
CCCCCC               ABCCCCC
like image 25
t-clausen.dk Avatar answered Dec 18 '22 17:12

t-clausen.dk