How to efficiently count totals for filter options

Question

I am working on a webshop-type application. One feature I often see in other websites is a breakdown of filtering options, with after that a total of how many results that filtering option will have. You often see this on computer sites (e.g. Newegg) or used car sites. Example:

CPU:
  * AMD (315)
  * Intel (455)

Video card:
  * ATI (378)
  * Nvidia (402)

How can I efficiently calculate these totals? The website I a working on will have many different products (10.000+) with many different options. To make matters worse, the products are constantly changing.

Trying to precalculate all the different filtering combination totals seems unfeasable. If I have 5 different filters with 4 options each, the number of option possibilities would be 20 * 16 * 12 * 8 * 4 = 122880. It would take a long time to calculate that.

Another option would be to query on-demand and cache the results (e.g. in Redis). But how could I manage the cache efficiently if products keep being added and removed? The caches would often be stale. I'm afraid I'd have to micro-manage cache invalidation somehow leading to a very complex and brittle implementation. The alternative would be to invalidate broad sections of cache. But immediately after invalidating, my database would be rushed by hunderds of queries from active users who need these totals recalculated.

Is there a nice and elegant way to handle this?

Neolisk · Accepted Answer

I see no problem with showing live data for your case. Not to discourage you in any way, but 10K products is not a lot, performance wise. Several millions, on the other hand, is.

Did you actually try to implement it this way and found that it performs slowly, or you are just over-conscious about its theoretical performance? I suggest you do some stress-testing on your system AS IS, and see if it's worth improving. Still, here are some ideas to make it faster:

Do not populate all counts at once, only if specific category is expanded/clicked. So you would always end up with a single SELECT cat_name, COUNT(*) GROUP BY cat_name query, which should not take up much time. Single and relatively light query like this per user click sounds reasonable for me.
Let the database engine manage caching for you. If you perform similar queries often, your database engine should automatically optimize the underlying storage (i.e. move that whole table to memory or similar). You just need to make sure the instance has enough memory.
Upgrade server hardware, if needed. If the amount of data increases, you may not have enough memory to store everything. Don't panic yet, you can still put an SSD in, or install a 12 core Xeon processor into the server, depending on where the bottleneck is.

How to efficiently count totals for filter options

Tags:

caching

filter

Sander Marechal

1 Answers

Neolisk

Recent Activity

Donate For Us

How to efficiently count totals for filter options

Tags:

caching

filter

Sander Marechal

1 Answers

Neolisk

Related questions

Recent Activity

Donate For Us