Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql select records with matching subsets

There are two sets of employees: managers and grunts.
For each manager, there's a table manager_meetings that holds a list of which meetings each manager attended. A similar table grunt_meetings holds a list of which meetings each grunt attended.

So:

manager_meetings                          grunt_meetings
managerID      meetingID                  gruntID        meetingID
1              a                          4              a
1              b                          4              b
1              c                          4              c
2              a                          4              d
2              b                          5              a 
3              c                          5              b
3              d                          5              c
3              e                          6              a 
                                          6              c 
                                          7              b 
                                          7              a

The owner doesn't like it when a manager and a grunt know exactly the same information. It makes his head hurt. He wants to identify this situation, so he can demote the manager to a grunt, or promote the grunt to a manager, or take them both golfing. The owner likes to golf.

The task is to list every combination of manager and grunt where both attended exactly the same meetings. If the manager attended more meeting than the grunt, no match. If the grunt attended more meetings than the manager, no match.

The expected results here are:

ManagerID            GruntID
2                    7
1                    5 

...because manager 2 and grunt 7 both attended (a,b), while manager 1 and grunt 5 both attended (a,b,c).

I can solve it in a clunky way, by pivoting up the subset of meetings in a subquery into XML, and comparing each grunt's XML list to each manager's XML. But that's horrible, and also I have to explain to the owner what XML is. And I don't like golfing.

Is there some better way to do "WHERE {subset1} = {subset2}"? It feels like I'm missing some clever kind of join.

SQL Fiddle

like image 459
SteveSmithSQL Avatar asked Jan 27 '14 22:01

SteveSmithSQL


People also ask

How do I get matching records in SQL?

(INNER) JOIN : Returns records that have matching values in both tables. LEFT (OUTER) JOIN : Returns all records from the left table, and the matched records from the right table. RIGHT (OUTER) JOIN : Returns all records from the right table, and the matched records from the left table.

What SQL clause is used to create subsets of data rows in a query?

The GROUP BY clause projects rows having common values into a smaller set of rows.

Which SQL command returns all records from one table and only matched records from second table?

Outer joins return all rows from one table and matching rows from the second table.

How do you SELECT a subset of data in SQL?

Type a "SELECT" statement that retrieves a subset of data from a table, using the following SQL code as a guide: SELECT name, address, city, date_last_purchased FROM customers WHERE state = 'CA'; Click the "Execute" button to run the "SELECT" statement query.


2 Answers

Here is a version that works:

select m.mId, g.gId, count(*) --select m.mid, g.gid, mm.meetingid, gm.meetingid as gmm
from manager m cross join
     grunt g left outer join
     (select mm.*, count(*) over (partition by mm.mid) as cnt
      from manager_meeting mm
     ) mm
     on mm.mid = m.mId full outer join
     (select gm.*, count(*) over (partition by gm.gid) as cnt
      from grunt_meeting gm
     ) gm
     on gm.gid = g.gid and gm.meetingid = mm.meetingid 
group by m.mId, g.gId, mm.cnt, gm.cnt
having count(*) = mm.cnt and mm.cnt = gm.cnt;

The string comparison method is shorter, perhaps easier to understand, and probably faster.

EDIT:

For your particular case of getting exact matches, the query can be simplified:

select mm.mId, gm.gId
from (select mm.*, count(*) over (partition by mm.mid) as cnt
      from manager_meeting mm
     ) mm join
     (select gm.*, count(*) over (partition by gm.gid) as cnt
      from grunt_meeting gm
     ) gm
     on gm.meetingid = mm.meetingid and
        mm.cnt = gm.cnt
group by mm.mId, gm.gId
having count(*) = max(mm.cnt);

This might be more competitive with the string version, both in terms of performance and clarity.

It counts the number of matches between a grunt and a manager. It then checks that this is all the meetings for each.

like image 186
Gordon Linoff Avatar answered Nov 09 '22 09:11

Gordon Linoff


An attempt at avenging Aaron's defeat – a solution using EXCEPT:

SELECT
  m.mID,
  g.gID
FROM
  manager AS m
INNER JOIN
  grunt AS g
ON  NOT EXISTS (
  SELECT meetingID
  FROM   manager_meeting
  WHERE  mID = m.mID
  EXCEPT
  SELECT meetingID
  FROM   grunt_meeting
  WHERE  gID = g.gID
)
AND NOT EXISTS (
  SELECT meetingID
  FROM   grunt_meeting
  WHERE  gID = g.gID
  EXCEPT
  SELECT meetingID
  FROM   manager_meeting
  WHERE  mID = m.mID
);

Basically, subtract a grunt's set of meetings from a manager's set of meetings, then the other way round. If neither result contains rows, the grunt and the manager attended the same set of meetings.

Please note that this query will match managers and grunts that never attended a single meeting.

like image 33
Andriy M Avatar answered Nov 09 '22 09:11

Andriy M