I want to find the distinct pairs of names in the table which have the same exact items in the items column. For instance:
CREATE TABLE t
(
name VARCHAR(255),
item VARCHAR(255)
);
INSERT INTO t VALUES("Alice", "Orange");
INSERT INTO t VALUES("Alice", "Pear");
INSERT INTO t VALUES("Alice", "Lemon");
INSERT INTO t VALUES("Bob", "Orange");
INSERT INTO t VALUES("Bob", "Pear");
INSERT INTO t VALUES("Bob", "Lemon");
INSERT INTO t VALUES("Charlie", "Pear");
INSERT INTO t VALUES("Charlie", "Lemon");
The answer here would be Alice,Bob
because they took the exact same items.
I want to do it with double negation (using NOT EXISTS/NOT IN) only which I think is more well-suited to this question, but I couldn't come up with anything that is remotely close to being functional.
This is somewhat similar to this question but I'm using SQLite so I cannot use GROUP_CONCAT() but I was wondering how it would be done using relational division using NOT EXISTS/NOT IN.
To get the number of common items between all pairs of names you can use the following query:
SELECT t1.name AS name1, t2.name AS name2, COUNT(*) AS cnt
FROM t AS t1
INNER JOIN t AS t2 ON t1.item = t2.item AND t1.name < t2.name
GROUP BY t1.name, t2.name
Output:
name1 name2 cnt
------------------------
Alice Bob 3
Alice Charlie 2
Bob Charlie 2
Now all you want is to filter out (name1, name2)
pairs having a count that is not equal to the number of items of name1
and name2
. You can do this using a HAVING
clause with correlated subqueries:
SELECT t1.name AS name1, t2.name AS name2
FROM t AS t1
INNER JOIN t AS t2 ON t1.item = t2.item AND t1.name < t2.name
GROUP BY t1.name, t2.name
HAVING COUNT(*) = (SELECT COUNT(*) FROM t WHERE name = t1.name) AND
COUNT(*) = (SELECT COUNT(*) FROM t WHERE name = t2.name)
Demo here
With compound queries:
SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
AND NOT EXISTS (SELECT item FROM t WHERE name = t1.name
EXCEPT
SELECT item FROM t WHERE name = t2.name)
AND NOT EXISTS (SELECT item FROM t WHERE name = t2.name
EXCEPT
SELECT item FROM t WHERE name = t1.name);
Using NOT IN is possible, bit expresses exactly the same mechanism with more complexity:
SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
AND NOT EXISTS (SELECT item
FROM t
WHERE name = t1.name
AND item NOT IN (SELECT item
FROM t
WHERE name = t2.name))
AND NOT EXISTS (SELECT item
FROM t
WHERE name = t2.name
AND item NOT IN (SELECT item
FROM t
WHERE name = t1.name));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With