Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres array query

Tags:

sql

postgresql

(The following is a highly simplified description of my problem. The company policy does not allow me to describe the actual scenario in any detail.)

The DB tables involved are:

PRODUCTS:
ID   Name
---------
1    Ferrari
2    Lamborghini
3    Volvo


CATEGORIES:
ID    Name
----------
10    Sports cars
20    Safe cars
30    Red cars

PRODUCTS_CATEGORIES
ProductID    CategoryID
-----------------------
1            10
1            30
2            10
3            20

LOCATIONS:
ID      Name
------------
100     Sports car store
200     Safe car store
300     Red car store
400     All cars r us


LOCATIONS_CATEGORIES:
LocationID    CategoryID
------------------------
100           10
200           20
300           30
400           10
400           20
400           30

Note that the locations are not directly connected to the products, only the categories. The customer should be able to see a list of locations that can provide all the product categories that the products they want to buy belong to. So, for example:

A customer wants to buy a Ferrari. This would be available from stores in categories 10 or 30. This gives us stores 100, 300 and 400 but not 200.

However, if a customer wants to buy a Volvo and a Lamborghini this would be available from stores in categories 10 and 20. Which only gives us store 400.

Another customer wants to buy a Ferrari and a Volvo. This they could get from a store in either categories 10 + 20 (sporty and safe) or categories 30 + 20 (red and safe).

What I need is a postgres query that takes a number of products and returns the locations where all of them can be found. I got started with arrays and the <@ operator but got lost quickly. Here follows some example SQL that attempts to get stores where a Ferrari and a Lamborghini can be bought. It does not work correctly since it requires the locations to satisfy all the categories that all the selected cars belong to. It returns location 400 only but should return locations 400 and 100.

SELECT l.* FROM locations l
WHERE 
(SELECT array_agg(DISTINCT(categoryid)) FROM products_categories WHERE productid IN (1,2))
<@
(SELECT array_agg(categoryid) FROM locations_categories WHERE locationid = l.id);

I hope my description makes sense.

like image 417
Henrik Söderlund Avatar asked Feb 10 '23 03:02

Henrik Söderlund


2 Answers

Here is the query. You should insert a list of selected cars Ids pc.ProductId in (1,3) and in the end you should correct condition to selected cars count so if you select 1 and 3 you should write HAVING COUNT(DISTINCT pc.ProductId) = 2 if you select 3 cars then there have to be 3. This condition in HAVING give you condition that ALL cars are in these Locations:

SELECT Id FROM Locations l
JOIN Locations_Categories lc on l.Id=lc.LocationId
JOIN Products_Categories pc on lc.CategoryId=pc.CategoryID
where pc.ProductId in (1,3)
GROUP BY l.id
HAVING COUNT(DISTINCT pc.ProductId) = 2

Sqlfiddle demo

For example for one car it will be:

SELECT Id FROM Locations l
JOIN Locations_Categories lc on l.Id=lc.LocationId
JOIN Products_Categories pc on lc.CategoryId=pc.CategoryID
where pc.ProductId in (1)
GROUP BY l.id
HAVING COUNT(DISTINCT pc.ProductId) = 1

Only Ferrary demo Volvo and a Lamborghini demo

like image 165
valex Avatar answered Feb 13 '23 12:02

valex


(This basically elaborates on @valex's answer, though I didn't realise that until I posted; please accept @valex's not this one).


This can be done using only joins and aggregation.

Build a join tree mapping locations to products, as normal. Then join it with the list of desired products (one-column values rows) and filter the join to only matching product names. You now have one row with the location of a product wherever that product can be found.

Now group by location and return locations where the number of products present is equal to the number we're looking for (for ALL). For ANY we omit the HAVING filter because any location row returned by the join is what we want.

So:

WITH wantedproducts(productname) AS (VALUES('Volvo'), ('Lamborghini'))
SELECT l."ID"
FROM locations l
INNER JOIN locations_categories lc ON (l."ID" = lc."LocationID")
INNER JOIN categories c ON (c."ID" = lc."CategoryID")
INNER JOIN products_categories pc ON (pc."CategoryID" = c."ID")
INNER JOIN products p ON (p."ID" = pc."ProductID")
INNER JOIN wantedproducts wp ON (wp.productname = p."Name")
GROUP BY l."ID"
HAVING count(DISTINCT p."ID") = (SELECT count(*) FROM wantedproducts);

is what you want, basically.

For "stores with any of the wanted products" queries, drop the HAVING clause.

You an also ORDER BY the aggregate if you want to show stores with any match but sort based on number of matches.

You can also add a string_agg(p."Name") to the SELECT values-list if you want to list products that can be found at that store.

If you want your input to be an array rather than a values-list, just replace the VALUES (...) with SELECT unnest($1) and pass your array as the parameter $1, or write it literally in place of $1.

like image 20
Craig Ringer Avatar answered Feb 13 '23 14:02

Craig Ringer