I'm trying to write code to pull a list of product items from a SQL Server database an display the results on a webpage.
A requirement of the project is that a list of categories is displayed at the right hand side of the page as a list of checkboxes (all categories selected by default) and a user can uncheck categories and re-query the database to view products's in only the categories they want.
Heres where it starts to get a bit hairy.
Each product can be assinged to multiple categories using a product categories table as below...
Product table [product_id](PK),[product_name],[product_price],[isEnabled],etc... Category table [CategoryID](PK),[CategoryName] ProductCagetory table [id](PK),[CategoryID](FK),[ProductID](FK)
I need to select a list of products that match a set of category ID's passed to my stored procedure where the products have multiple assigned categories.
The categort id's are passed to the proc as a comma delimited varchar i.e. ( 3,5,8,12 )
The SQL breaks this varchar value into a resultset in a temp table for processing.
How would I go aout writing this query?
One problem is passing the array or list of selected categories into the server. The subject was covered at large by Eland Sommarskog in the series of articles Arrays and Lists in SQL Server. Passing the list as a comma separated string and building a temp table is one option. There are alternatives, like using XML, or a Table-Valued-Parameter (in SQL Server 2008) or using a table @variable instead of a #temp table. The pros and cons of each are covered in the article(s) I linked.
Now on how to retrieve the products. First things first: if all categories are selected then use a different query that simply retrieves all products w/o bothering with categories at all. This will save a lot of performance and considering that all users will probably first see a page w/o any category unselected, the saving can be significant.
When categories are selected, then building a query that joins products, categories and selected categories is fairly easy. Making it scale and perform is a different topic, and is entirely dependent on your data schema and actual pattern of categories selected. A naive approach is like this:
select ...
from Products p
where p.IsEnabled = 1
and exists (
select 1
from ProductCategories pc
join #selectedCategories sc on sc.CategoryID = pc.CategoryID
where pc.ProductID = p.ProductID);
The ProductsCategoriestable must have an index on (ProductID, CategoryID)
and one on (CategoryID, ProductID)
(one of them is the clustered, one is NC). This is true for every solution btw. This query would work if most categories are always selected and the result contains most products anyway. But if the list of selected categories is restrictive then is better to avoid the scan on the potentially large Products table and start from the selected categories:
with distinctProducts as (
select distinct pc.ProductID
from ProductCategories pc
join #selectedCategories sc on pc.CategoryID = sc.CategoryID)
select p.*
from Products p
join distinctProducts dc on p.ProductID = dc.ProductID;
Again, the best solution depends largely on the shape of your data. For example if you have a very skewed category (one categoru alone covers 99% of products) then the best solution would have to account for this skew.
This gets all products that are at least in all of the desired categories (no less):
select * from product p1 join (
select p.product_id from product p
join ProductCategory pc on pc.product_id = p.product_id
where pc.category_id in (3,5,8,12)
group by p.product_id having count(p.product_id) = 4
) p2 on p1.product_id = p2.product_id
4
is the number of categories in the set.
This gets all products that are exactly in all of the desired categories (no more, no less):
select * from product p1 join (
select product_id from product p1
where not exists (
select * from product p2
join ProductCategory pc on pc.product_id = p2.product_id
where p1.product_id = p2.product_id
and pc.category_id not in (3,5,8,12)
)
group by product_id having count(product_id) = 4
) p2 on p1.product_id = p2.product_id
The double negative can be read as: get all products for which there are no categories that are not in the desired category list.
For the products in any of the desired categories, it's as simple as:
select * from product p1 where exists (
select * from product p2
join ProductCategory pc on pc.product_id = p2.product_id
where
p1.product_id = p2.product_id and
pc.category_id in (3,5,8,12)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With