What I want to achieve:
I am developing website with a catalog of products.
This is normalized model (simplified) of entities which are related to my question:
So some product features exist (like size and type in this example), which all have predefined sets of values (e.g. sizes 1, 2 and 3 exist, and type may be 1, 2 or 3 (these sets do not have to be equal, just example.)).
Relationship between Product and each of features is "many-to-many" - different values of one feature do not exclude each other.
My task is to build form which will allow user to filter search results, based on features of products. Example screenshot:
Multiple checked values of one feature are mixed using "AND" logic, so if I have sizes One and Three checked, I need all products, which have both sizes (+ may have any other sizes, that doesn't matter, but selected ones must be present).
Number near each value of feature represents amount of products, which is returned if user checks this value right now. So it is effectively a number of products satisfying filter "current active filter + this one value applied".
When user checks/unchecks any value, counters must be updated considering new "current filter".
Problem:
Real use case is: ~200k products, ~6 features with ~5-15 values each.
My COUNT
queries, (especially with decent number of selected options) are too slow, and to render the form I need as many of these counts as there are values of all filters - in total that gives unacceptable response time.
What I have tried:
Query to retrieve results:
select * from products p, product_size ps
where p.id = ps.product_id
and (ps.size_id IN (1, 2, 3, 5))
group by p.id
having count(p.id) = 4;
(this is to select products which have sizes 1, 2, 3 and 5 at the same time).
It completes in ~0.360
sec on 120k products, almost same time with COUNT
wrapped around it. And this query does not allow more than one feature (but I could place values of all features in one table).
Another query to retrieve the same set:
SELECT ps1.product_id
FROM product_size AS ps1, (SELECT id FROM size AS s1 WHERE id IN (1, 2, 3, 5)) AS t
WHERE ps1.size_id = t.id
GROUP BY ps1.product_id
HAVING COUNT(ps1.size_id) = (SELECT COUNT(id) FROM (SELECT id FROM size AS s2 WHERE id IN (1, 2, 3, 5)) AS t2);
It completes in ~0.230
sec (same time when wrapped in COUNT
) and does not allow multiple features too.
It is modified query I found here: https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/ (second query in "Division with a Remainder" part).
Denormalized model, where value of each feature is a boolean column in Products table.
The query is obvious here:
select * from products
where `size_1` = 1 and `size_2` = 1
and `size_3` = 1 and `size_5` = 1;
Weird and harder to maintain in application's code, but completes in ~0.056
sec when COUNT
-ing.
None of these methods are acceptable per se because multiplied ~30 times (to populate all counters in form) that gives inadequate response time.
Caching and precomputing
Data in DB is going to be updated only few times a day (like, may be, even 2), so I could probably precompute counts for all combinations of filters when data is updated (I haven't measured necessary time to be honest), but it is anyway not going to work too - search form has fields with arbitrary values (like min/max price and text search by the product's name), which I can't precompute for.
Load counters in form dynamically
Render form, but fetch numbers through AJAX, so user would be able to see page, and then, after quite long waiting, numbers. This is my last thought, but it seems like poor quality of service for me (may be it is worse than no counters at all).
I am stuck. Any hints? May be I am not seeing some bigger picture? I would be very glad to any advice.
UPDATE: if we forget about counters, what is the effective and usually used way (query) for just retrieving results with such a filters (or what am I doing wrong)? Like "find post with all requested tags" model, that is equivalent. I suspect it can be faster than my 0.230
sec (query #2), considering small (?) amount of rows for MySQL.
You can
I have used this before, and it worked for me, you can first make table and try running query to check response time.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With