Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return distinct pairs of names which have the same exact items in column

I want to find the distinct pairs of names in the table which have the same exact items in the items column. For instance:

CREATE TABLE t
(
    name    VARCHAR(255),
    item    VARCHAR(255)
);

INSERT INTO t VALUES("Alice", "Orange");
INSERT INTO t VALUES("Alice", "Pear");
INSERT INTO t VALUES("Alice", "Lemon");
INSERT INTO t VALUES("Bob", "Orange");
INSERT INTO t VALUES("Bob", "Pear");
INSERT INTO t VALUES("Bob", "Lemon");
INSERT INTO t VALUES("Charlie", "Pear");
INSERT INTO t VALUES("Charlie", "Lemon");

The answer here would be Alice,Bob because they took the exact same items.

I want to do it with double negation (using NOT EXISTS/NOT IN) only which I think is more well-suited to this question, but I couldn't come up with anything that is remotely close to being functional.

This is somewhat similar to this question but I'm using SQLite so I cannot use GROUP_CONCAT() but I was wondering how it would be done using relational division using NOT EXISTS/NOT IN.

like image 887
maregor Avatar asked Nov 04 '15 04:11

maregor


2 Answers

To get the number of common items between all pairs of names you can use the following query:

SELECT t1.name AS name1, t2.name AS name2, COUNT(*) AS cnt
FROM t AS t1
INNER JOIN t AS t2 ON t1.item = t2.item AND t1.name < t2.name
GROUP BY t1.name, t2.name

Output:

name1   name2       cnt
------------------------
Alice   Bob         3
Alice   Charlie     2
Bob     Charlie     2

Now all you want is to filter out (name1, name2) pairs having a count that is not equal to the number of items of name1 and name2. You can do this using a HAVING clause with correlated subqueries:

SELECT t1.name AS name1, t2.name AS name2
FROM t AS t1
INNER JOIN t AS t2 ON t1.item = t2.item AND t1.name < t2.name
GROUP BY t1.name, t2.name
HAVING COUNT(*) = (SELECT COUNT(*) FROM t WHERE name = t1.name) AND 
       COUNT(*) = (SELECT COUNT(*) FROM t WHERE name = t2.name)

Demo here

like image 199
Giorgos Betsos Avatar answered Sep 23 '22 10:09

Giorgos Betsos


With compound queries:

SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
   AND NOT EXISTS (SELECT item FROM t WHERE name = t1.name
                   EXCEPT
                   SELECT item FROM t WHERE name = t2.name)
   AND NOT EXISTS (SELECT item FROM t WHERE name = t2.name
                   EXCEPT
                   SELECT item FROM t WHERE name = t1.name);

Using NOT IN is possible, bit expresses exactly the same mechanism with more complexity:

SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
   AND NOT EXISTS (SELECT item
                   FROM t
                   WHERE name = t1.name
                     AND item NOT IN (SELECT item
                                      FROM t
                                      WHERE name = t2.name))
   AND NOT EXISTS (SELECT item
                   FROM t
                   WHERE name = t2.name
                     AND item NOT IN (SELECT item
                                      FROM t
                                      WHERE name = t1.name));
like image 21
CL. Avatar answered Sep 22 '22 10:09

CL.