Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently count totals for filter options

Tags:

caching

filter

I am working on a webshop-type application. One feature I often see in other websites is a breakdown of filtering options, with after that a total of how many results that filtering option will have. You often see this on computer sites (e.g. Newegg) or used car sites. Example:

CPU:
  * AMD (315)
  * Intel (455)

Video card:
  * ATI (378)
  * Nvidia (402)

How can I efficiently calculate these totals? The website I a working on will have many different products (10.000+) with many different options. To make matters worse, the products are constantly changing.

Trying to precalculate all the different filtering combination totals seems unfeasable. If I have 5 different filters with 4 options each, the number of option possibilities would be 20 * 16 * 12 * 8 * 4 = 122880. It would take a long time to calculate that.

Another option would be to query on-demand and cache the results (e.g. in Redis). But how could I manage the cache efficiently if products keep being added and removed? The caches would often be stale. I'm afraid I'd have to micro-manage cache invalidation somehow leading to a very complex and brittle implementation. The alternative would be to invalidate broad sections of cache. But immediately after invalidating, my database would be rushed by hunderds of queries from active users who need these totals recalculated.

Is there a nice and elegant way to handle this?

like image 212
Sander Marechal Avatar asked Oct 23 '12 06:10

Sander Marechal


1 Answers

I see no problem with showing live data for your case. Not to discourage you in any way, but 10K products is not a lot, performance wise. Several millions, on the other hand, is.

Did you actually try to implement it this way and found that it performs slowly, or you are just over-conscious about its theoretical performance? I suggest you do some stress-testing on your system AS IS, and see if it's worth improving. Still, here are some ideas to make it faster:

  1. Do not populate all counts at once, only if specific category is expanded/clicked. So you would always end up with a single SELECT cat_name, COUNT(*) GROUP BY cat_name query, which should not take up much time. Single and relatively light query like this per user click sounds reasonable for me.

  2. Let the database engine manage caching for you. If you perform similar queries often, your database engine should automatically optimize the underlying storage (i.e. move that whole table to memory or similar). You just need to make sure the instance has enough memory.

  3. Upgrade server hardware, if needed. If the amount of data increases, you may not have enough memory to store everything. Don't panic yet, you can still put an SSD in, or install a 12 core Xeon processor into the server, depending on where the bottleneck is.

like image 79
Neolisk Avatar answered Jan 03 '23 12:01

Neolisk