Assuming I have the following table:
AAAAAA
AAAAAB
CCCCCC
How could I craft a query that would let me know that AAAAA
and AAAAB
are similar (as they share five characters in a row)? Ideally I would like to write this as a query that would check if the two fields shared five characters in a row anywhere in the string but this seems outside the scope of SQL and something I should write into a C# application?
Ideally the query would add another column that displays: Similar to 'AAAAA', 'AAAAB'
I suggest you do not try to violate 1NF by introducing a multi-valued attribute.
Noting that SUBSTRING
is highly portable:
WITH T
AS
(
SELECT *
FROM (
VALUES ('AAAAAA'),
('AAAAAB'),
('CCCCCC')
) AS T (data_col)
)
SELECT T1.data_col,
T2.data_col AS data_col_similar_to
FROM T AS T1, T AS T2
WHERE T1.data_col < T2.data_col
AND SUBSTRING(T1.data_col, 1, 5)
= SUBSTRING(T2.data_col, 1, 5);
Alternativvely:
T1.data_col LIKE SUBSTRING(T2.data_col, 1, 5) + '%';
This will find all matches, also those in the middle of the word, it will not perform well on a big table
declare @t table(a varchar(20))
insert @t select 'AAAAAA'
insert @t select 'AAAAAB'
insert @t select 'CCCCCC'
insert @t select 'ABCCCCC'
insert @t select 'DDD'
declare @compare smallint = 5
;with cte as
(
select a, left(a, @compare) suba, 1 h
from @t
union all
select a, substring(a, h + 1, @compare), h+1
from cte where cte.h + @compare <= len(a)
)
select t.a, cte.a match from @t t
-- if you don't want the null matches, remove the 'left' from this join
left join cte on charindex(suba, t.a) > 0 and t.a <> cte.a
group by t.a, cte.a
Result:
a match
-------------------- ------
AAAAAA AAAAAB
AAAAAB AAAAAA
ABCCCCC CCCCCC
CCCCCC ABCCCCC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With