Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - How to find word with the most similar beginning

How to find varchar-word that has the most similar beginning of the specified word in MySQL database?

For example:

+-------------------+
|    word_column    | 
+-------------------+
| StackOferflow     |
| StackExchange     |
| MetaStackExchange |
|       ....        |
+-------------------+

query: call get_with_similar_begin('StackExch_bla_bla_bla');
output: 'StackExchange'

query: call get_with_similar_begin('StackO_bla_bla_bla');
output: 'StackOferflow'


UPDATE :

Select * from words where word_column like 'StackExch_bla_bla_bla' will not give the correct result, because 'StackExchange' does not match this filter.

Additional info: I has BTREE-index on word_column and I would like to use it whenever possible

like image 717
Palindromer Avatar asked Oct 22 '17 09:10

Palindromer


2 Answers

In SQL Server we can use CTE like below query to achieve what you want:

declare @search nvarchar(255) = 'StackExch_bla_bla_bla';

-- A cte that contains `StackExch_bla_bla_bla` sub-strings: {`StackExch_bla_bla_bla`, `StackExch_bla_bla_bl`, ...,  `S`}
with cte(part, lvl) as (  
    select @search, 1
    union all 
    select substring(@search, 1, len(@search) - lvl), lvl + 1
    from cte
    where lvl < len(@search)
), t as (   -- Now below cte will find match level of each word_column
    select t.word_column, min(cte.lvl) matchLvl
    from yourTable t
    left join cte
      on t.word_column like cte.part+'%'
    group by t.word_column
)
select top(1) word_column
from t
where matchLvl is not null   -- remove non-matched rows
order by matchLvl;

SQL Server Fiddle Demo

I need more time to find a way in MySQL for it, Hope some MySQL experts answer faster ;).

My best try in MySQL is this:

select tt.word_column
from (
  select t.word_column, min(lvl) matchLvl
  from yourTable t
  join (
    select 'StackExch_bla_bla_bla' part, 1 lvl
    union all select 'StackExch_bla_bla_bl', 2
    union all select 'StackExch_bla_bla_b', 3
    union all select 'StackExch_bla_bla_', 4
    union all select 'StackExch_bla_bla', 5
    union all select 'StackExch_bla_bl', 6
    union all select 'StackExch_bla_b', 7
    union all select 'StackExch_bla_', 8
    union all select 'StackExch_bla', 9
    union all select 'StackExch_bl', 10
    union all select 'StackExch_b', 11
    union all select 'StackExch_', 12
    union all select 'StackExch', 13
    union all select 'StackExc', 14
    union all select 'StackEx', 15
    union all select 'StackE', 16
    union all select 'Stack', 17
    union all select 'Stac', 18
    union all select 'Sta', 19
    union all select 'St', 20
    union all select 'S', 21
  ) p on t.word_column like concat(p.part, '%')
  group by t.word_column
  ) tt
order by matchLvl
limit 1;

I think by creating a stored procedure and using a temp table to store values in p sub-select you can achieve what you want -HTH ;).

MySQL Fiddle Demo

like image 145
shA.t Avatar answered Sep 27 '22 19:09

shA.t


This is a slight variation on @shA.t's answer. The aggregation is not necessary:

select t.*, p.lvl
from yourTable t join
     (select 'StackExch_bla_bla_bla' as part, 1 as lvl union all
      select 'StackExch_bla_bla_bl', 2 union all
      select 'StackExch_bla_bla_b', 3 union all
      select 'StackExch_bla_bla_', 4 union all
      select 'StackExch_bla_bla', 5 union all
      select 'StackExch_bla_bl', 6 union all
      select 'StackExch_bla_b', 7 union all
      select 'StackExch_bla_', 8 union all
      select 'StackExch_bla', 9 union all
      select 'StackExch_bl', 10 union all
      select 'StackExch_b', 11 union all
      select 'StackExch_', 12 union all
      select 'StackExch', 13 union all
      select 'StackExc', 14 union all
      select 'StackEx', 15 union all
      select 'StackE', 16 union all
      select 'Stack', 17 union all
      select 'Stac', 18 union all
      select 'Sta', 19 union all
      select 'St', 20 union all
      select 'S', 21
     ) p
     on t.word_column like concat(p.part, '%')
order by matchLvl
limit 1;

A faster way is to use case:

select t.*,
       (case when t.word_column like concat('StackExch_bla_bla_bla', '%') then 'StackExch_bla_bla_bla'
             when t.word_column like concat('StackExch_bla_bla_bl', '%') then 'StackExch_bla_bla_bl'
             when t.word_column like concat('StackExch_bla_bla_b', '%') then 'StackExch_bla_bla_b'
             . . .
             when t.word_column like concat('S', '%') then 'S'
             else ''
        end) as longest_match
from t
order by length(longest_match) desc
limit 1;

Neither of these will make effective use of the index.

If you want a version that uses the index, then do the looping at the application layer, and repeated run the query as:

select t.*
from t
where t.word_column like 'StackExch_bla_bla_bla%'
limit 1;

Then stop when you hit the first match. MySQL should be using the index for the like comparison.

You can come pretty close to this using a union all:

(select t.*, 'StackExch_bla_bla_bla' as matching
 from t
 where t.word_column like 'StackExch_bla_bla_bla%'
 limit 1
) union all
(select t.*, 'StackExch_bla_bla_bl'
 from t
 where t.word_column like 'StackExch_bla_bla_bl%'
 limit 1
) union all
(select t.*, 'StackExch_bla_bla_b'
 from t
 where t.word_column like 'StackExch_bla_bla_b%'
 limit 1
) union al
. . .
(select t.*, 'S'
 from t
 where t.word_column like 'S%'
 limit 1
)
order by length(matching) desc
limit 1;
like image 25
Gordon Linoff Avatar answered Sep 27 '22 21:09

Gordon Linoff