Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement filters with counters

Tags:

sql

mysql

What I want to achieve:

I am developing website with a catalog of products.
This is normalized model (simplified) of entities which are related to my question:

enter image description here

So some product features exist (like size and type in this example), which all have predefined sets of values (e.g. sizes 1, 2 and 3 exist, and type may be 1, 2 or 3 (these sets do not have to be equal, just example.)).
Relationship between Product and each of features is "many-to-many" - different values of one feature do not exclude each other.
My task is to build form which will allow user to filter search results, based on features of products. Example screenshot:

enter image description here

Multiple checked values of one feature are mixed using "AND" logic, so if I have sizes One and Three checked, I need all products, which have both sizes (+ may have any other sizes, that doesn't matter, but selected ones must be present).

Number near each value of feature represents amount of products, which is returned if user checks this value right now. So it is effectively a number of products satisfying filter "current active filter + this one value applied".

When user checks/unchecks any value, counters must be updated considering new "current filter".

Problem:

Real use case is: ~200k products, ~6 features with ~5-15 values each.
My COUNT queries, (especially with decent number of selected options) are too slow, and to render the form I need as many of these counts as there are values of all filters - in total that gives unacceptable response time.

What I have tried:

  • Query to retrieve results:

    select * from products p, product_size ps
    where p.id = ps.product_id
    and (ps.size_id IN (1, 2, 3, 5))
    group by p.id
    having count(p.id) = 4;
    

(this is to select products which have sizes 1, 2, 3 and 5 at the same time).
It completes in ~0.360 sec on 120k products, almost same time with COUNT wrapped around it. And this query does not allow more than one feature (but I could place values of all features in one table).

  • Another query to retrieve the same set:

    SELECT ps1.product_id
    FROM product_size AS ps1, (SELECT id FROM size AS s1 WHERE id IN (1, 2, 3, 5)) AS t
    WHERE ps1.size_id = t.id
    GROUP BY ps1.product_id
    HAVING COUNT(ps1.size_id) = (SELECT COUNT(id) FROM (SELECT id FROM size AS s2 WHERE id IN (1, 2, 3, 5)) AS t2);
    

It completes in ~0.230 sec (same time when wrapped in COUNT) and does not allow multiple features too.
It is modified query I found here: https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/ (second query in "Division with a Remainder" part).

  • Alternative schema:
    enter image description here

Denormalized model, where value of each feature is a boolean column in Products table.
The query is obvious here:

    select * from products
    where `size_1` = 1 and `size_2` = 1
      and `size_3` = 1 and `size_5` = 1;

Weird and harder to maintain in application's code, but completes in ~0.056 sec when COUNT-ing.

None of these methods are acceptable per se because multiplied ~30 times (to populate all counters in form) that gives inadequate response time.

  • Caching and precomputing
    Data in DB is going to be updated only few times a day (like, may be, even 2), so I could probably precompute counts for all combinations of filters when data is updated (I haven't measured necessary time to be honest), but it is anyway not going to work too - search form has fields with arbitrary values (like min/max price and text search by the product's name), which I can't precompute for.

  • Load counters in form dynamically
    Render form, but fetch numbers through AJAX, so user would be able to see page, and then, after quite long waiting, numbers. This is my last thought, but it seems like poor quality of service for me (may be it is worse than no counters at all).

I am stuck. Any hints? May be I am not seeing some bigger picture? I would be very glad to any advice.

UPDATE: if we forget about counters, what is the effective and usually used way (query) for just retrieving results with such a filters (or what am I doing wrong)? Like "find post with all requested tags" model, that is equivalent. I suspect it can be faster than my 0.230 sec (query #2), considering small (?) amount of rows for MySQL.

like image 601
Oleg Arkhipov Avatar asked Jan 12 '17 07:01

Oleg Arkhipov


1 Answers

You can

  1. Create one table which will store all possible combinations (product_id <> size_id <> type_id)
  2. Update this table when Admin will make any changes in product from backend (assuming there will be a backend management)
  3. In frontend, for filters, use this table instead of product tables, and extract product ids once filter query is fired
  4. Once you have list of product ids for result, you can fetch actual data by using those product Ids

I have used this before, and it worked for me, you can first make table and try running query to check response time.

Hope this helps.

like image 79
Patrick R Avatar answered Oct 31 '22 06:10

Patrick R