Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REGEXP to match a group string within another group list regardless of the order - SQL

Here is the scenario. How to utilize REGEXP to simulate IN operator yet to match all values in left side within right side regardless of the string order in either side. Also ANSI SQL solution can be achieved using left join and sub queries.

  • SQLFIDDLE DEMO

Sample table:

Parent table, Child table, Parent_Child. In order to not take more space on the question, I only post here a reulst of a Group_Concat Child query by Parent.

PID     NAME        CHILDREN    
1       dad john    dave,jill,lina
2       mum sandy   maryam,jack 
3       dad frank   henry,jill 
4       mum kate    maryam  
5       mum jean    dave

Expected Result: Select Parent who has their all children participated in something.

PID     NAME        CHILDRENREXGEX
3       dad frank   jill,henry
4       mum kate    maryam
5       mum jean    dave

Here is the REGEXP SQL solution: now the issue here, it doesn't return correct results if left side order/squence not match right side.

Query:

select 
    x.pid, x.name, x.children as childrenRexgex
from
    (select 
         p.pid, p.name, group_concat(c.name) as children
     from 
         parent as p
     inner join 
         parent_child as pc on p.pid = pc.pid
     join 
         child as c on pc.cid = c.cid
     group by 
         p.pid
     order by 
         c.name) as x
where 
    'dave,maryam,jill,henry' REGEXP x.children
;

Hence there are two aspects I would appreciate for the question:

  1. What is the best pattern to match all names in left side to the user defined list of right side regardless of the order?
  2. What could be the performance gained using REGEXP?
like image 884
bonCodigo Avatar asked Nov 04 '22 08:11

bonCodigo


1 Answers

Do you want to use regexp, or another solution is fine anyway? If I understand correctly, this query should give you the correct result:

select p.pid, parent.name, group_concat(child.name)
from
  (select pid
   from 
     parent_child inner join child
     on parent_child.cid = child.cid
   group by pid
   having sum(child.name in ('dave','henry','maryam','jill'))=count(*)) p
  inner join parent on p.pid=parent.pid
  inner join parent_child on p.pid=parent_child.pid
  inner join child on parent_child.cid=child.cid
group by p.pid

that gives:

PID     NAME        CHILDRENREXGEX
3       dad frank   jill,henry
4       mum kate    maryam
5       mum jean    dave

Anyway, to use your solution, I would suggest you to use group_concat with an order:

select
  x.pid,
  x.name,
  x.children as childrenRexgex
from(
  select
    p.pid,
    p.name, 
    group_concat(c.name order by c.name) as children,
    count(c.name) as counts
  from
    parent as p inner join parent_child as pc
    on p.pid = pc.pid
    join child as c
    on pc.cid = c.cid
  group by p.pid) as x
where 'dave,henry,jill,maryam'
  REGEXP x.children

and try to match the names already ordered. This is identical to your query, i only added an order by c.name inside group_concat, and i also ordered the string in the where condition.

Edit: If you really want to use a REGEXP, since MySql support to regular expressions is limited, i would suggest you to try to use LIB_MYSQLUDF_PREG. This is a general solution that doesn't work with standard MySql regular expressions.

If you need to match a string like this:

One,Two,Three,Four

with, for example, this:

Two,Four,Three,One

you have to use a regexp like this one:

"One,Two,Three,Four" REGEXP
"^(?=.*\bTwo\b)(?=.*\bFour\b)(?=.*\bThree\b)(?=.*\bOne\b)"

(check this question) And this is what it does:

  1. \bTwo\b matches full word Two, can be: Two Two, ,Two ,Two,
  2. .* word Two can be found anywhere in the string .*\bTwo\b
  3. (?=.*\bTwo\b) match full word Two, anywhere in the string, but forget the location and start to mach the next term from the beginning
  4. start to match other words

Still missing something? Yes, because if we have a match for "One,Two,Three,Four", also "One,Two,Three,Four,Five" will match. Maybe there's a better regexp for this, but my idea is: if they match, and have the same length, they have to be identical except for the order. So we can add this at the end of our regexp:

  1. .{length}$ remember that after all previous matches, we are still at the beginning, and ^.{length}$ matches a string of given length

So final code would be:

field1="One,Two,Three,Four"
field2="Two,Four,Three,One"

field1 REGEXP CONCAT("^(?=.*\b", 
                     REPLACE(field2, ",", "\b)(?=.*\b"),
                     "\b).{", LENGTH(field1), "}$")

Notice that this regexp is not supported by REGEXP, it should be supported on LIB_MYSQLUDF_PREG but I still haven't tested it. I will let you know. There might be other solutions, but I don't think it's possible to do any better with just REGEXP.

like image 57
fthiella Avatar answered Nov 08 '22 10:11

fthiella