For certain filtering sub conditions on a products table I want to find out the distinct categories of those filtered products (millions of products).
The clear way of doing it
SELECT DISTINCT category_id FROM "products" WHERE _conditions_
takes long to complete when there are a lot of rows involved (it's not really changing much to use GROUP BY
instead)
According to https://wiki.postgresql.org/wiki/Loose_indexscan when there are only relatively small amount of different values on the distinct col (like is the case here with ~30 categories), the condition
SELECT DISTINCT category_id FROM "products"
can be restated as a recursive CTE
WITH RECURSIVE t AS (
SELECT MIN(category_id) AS category_id FROM "products"
UNION ALL
SELECT (SELECT MIN(category_id) FROM "products" WHERE category_id > t.category_id)
FROM t WHERE t.category_id IS NOT NULL
)
SELECT category_id FROM t WHERE category_id IS NOT NULL
UNION ALL
SELECT NULL WHERE EXISTS(SELECT 1 FROM "products" WHERE category_id IS NULL);
That actually perfoms better in my use case (milisecons instead of seconds). But it is without the conditions.
How to add the 'WHERE _conditions_
' part properly to the recursive CTE?
WITH RECURSIVE t AS (
SELECT MIN(category_id) AS category_id FROM "products" WHERE _conditions_
UNION ALL
SELECT (SELECT MIN(category_id) FROM "products" WHERE category_id > t.category_id AND _conditions_ )
FROM t WHERE t.category_id IS NOT NULL
)
SELECT category_id FROM t WHERE category_id IS NOT NULL
UNION ALL
SELECT NULL WHERE EXISTS(SELECT 1 FROM "products" WHERE category_id IS NULL AND _conditions_);
Like this. There are three of that conditions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With