I have the following example data structure of customer that can be part of multiple groups using a junction table and data:
CREATE TABLE customer(id) AS VALUES (0),(1),(2),(3);
CREATE TABLE groups(id) AS VALUES (1),(3),(5),(6);
CREATE TABLE customers_to_groups(customer_id, group_id) AS
VALUES (0, 1)--customer 0 is in group (5 OR 6) AND (1 OR 3)
,(0, 5)--customer 0 is in group (5 OR 6) AND (1 OR 3)
,(1, 1)
,(1, 90)
,(2, 1)
,(3, 3)--customer 3 is in group (5 OR 6) AND (1 OR 3)
,(3, 5)--customer 3 is in group (5 OR 6) AND (1 OR 3)
,(3, 90);
I need to get customers that have specific groups they are part of, and I need to get a list of all customers that are part of at least 1 group in multiple lists of group. For example I want to get all customers that are in group (5 OR 6) AND (1 OR 3), so for example a customer in group 5 and 1 wold be a returned, but somebody in group 1 and 90 or just group 1 not. With the provided sample data we would get the customer of id 0 and 3 only as they conform to the given rules above.
Just doing WHERE group_id IN (5,6) AND group_id IN (1,3) does not seem to work, so I am looking for alternative.
I got this so far that works:
SELECT DISTINCT c.id
FROM customer c
INNER JOIN customers_to_groups at1 ON c.id = at1.customer_id
INNER JOIN customers_to_groups at2 ON c.id = at2.customer_id
WHERE at1.group_id IN (5, 6)
AND at2.group_id IN (1, 3);
Expected Results:
| id |
|---|
| 0 |
| 3 |
Is there a way to do it that is more performant?
We can GROUP BY customer's id and use a HAVING clause. There we can use CASE or FILTER if your RDBMS supports it. Postgres should.
There your conditions will be set.
The query will be:
SELECT c.id
FROM customer c
INNER JOIN customers_to_groups ctg
ON c.id = ctg.customer_id
AND ctg.group_id IN (1,3,5,6)
GROUP BY c.id
HAVING
COUNT(CASE WHEN ctg.group_id IN (1,3) THEN 1 END) > 0
AND COUNT(CASE WHEN ctg.group_id IN (5,6) THEN 1 END) > 0;
or
SELECT c.id
FROM customer c
INNER JOIN customers_to_groups ctg
ON c.id = ctg.customer_id
AND ctg.group_id IN (1,3,5,6)
GROUP BY c.id
HAVING
COUNT(*) FILTER(WHERE ctg.group_id IN (1,3)) > 0
AND COUNT(*) FILTER(WHERE ctg.group_id IN (5,6)) > 0;
Note: Above queries assume you really need to include both customers and customers_to_groups table and join them. If you don't need to include the customers table, just remove it and select from the table customers_to_groups only to improve the performance:
SELECT customer_id
FROM customers_to_groups ctg
WHERE ctg.group_id IN (1,3,5,6)
GROUP BY customer_id
HAVING
COUNT(*) FILTER(WHERE ctg.group_id IN (1,3)) > 0
AND COUNT(*) FILTER(WHERE ctg.group_id IN (5,6)) > 0;
This demo with your sample data and a lot of additional rows shows the performance differences.
Using FILTER or CASE is far faster (as it avoids a second JOIN on the table customers_to_groups). The exact performance difference depends on the real data in your tables and which index(es) you use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With