Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL and faceted navigation (filter by attributes)

I feel like this question has probably been asked a thousand times already, so I apologize if it's been answered. And if so, can someone point me to the right posts/links?

What I'm trying to do is build a faceted navigation for my site. It uses MySQL and here's a rough sketch of the tables I'm using:

products:
- id
- title
- description
attributes:
- product_id
- name
- value
categories:
- id
- name
products_to_categories:
- product_id
- category_id

What I want to do is display a list of available attributes when you are in a category, allowing you to select one or more values for each of those attributes. To give you an example, look at this page from Office Depot: http://www.officedepot.com/a/browse/binders/N=5+2177/

So far I've used a lot of joins to filter on multiple attributes:

SELECT products.*, a_options.*
FROM products_to_categories AS pc, products,
attributes AS a_options,    /* list of attribute/value pairs I can continue to refine on */
attributes AS a_select1     /* first selected attribute */
attributes AS a_select2     /* second selected attribute */
...
WHERE pc.category_id = 1
AND products.id = pc.product_id
AND a_options.product_id = products.id
AND a_options.name != 'Color' AND a_options.name != 'Size'
AND a_select1.product_id = products.id
AND a_select1.name = 'Color' AND (a_select1.value = 'Blue' OR a_select1.value = 'Black')
AND a_select2.product_id = products.id
AND a_select2.name = 'Size' AND a_select2.value = '8.5 x 11'

Basically a_options will return all the attributes for those products that are a subset of the filters I've applied using a_select1 and a_select2. So if I use the Binders example from Office Depot, I want to show all available attributes after selecting Blue or Black for Color and "8.5 x 11" for the Size.

I then use PHP code to remove duplicates and arrange the resulting attributes into an array like this:

attributes[name1] = (val1, val2, val3, ...)
attributes[name2] = (val1, val2, val3, ...)

Is there a way I can speed up my query or write it more efficiently? I have setup indexes on the name and value in the attributes table (and also on all the ID numbers). But if someone selects a couple of attributes, then the query runs slow.

Thanks for your help in advance,
Sridhar

like image 599
Sridhar Balasubramanian Avatar asked Dec 09 '22 18:12

Sridhar Balasubramanian


2 Answers

"I then use PHP code to remove duplicates"

It will not scale then.

After I read http://www.amazon.com/Data-Warehouse-Toolkit-Techniques-Dimensional/dp/0471153370 I was rolling out facets & filtering mechanisms non stop.

The basic idea is you use a star schema..

You create a fact table that stores facts

customerid | dateregisteredid | datelastloginid
1 | 1 | 1
2 | 1 | 2

You use foreign keys into dimension tables that store attributes

date_registered
Id | weekday | weeknumber | year | month | month_year | daymonth | daymonthyear
1 | Wed      | 2            | 2009 | 2   |2-2009      | 4        | 4-2-2009

Then whichver date "paradigm" you are using, grab all the ids from that dimension table and

 select * from the fact table where the fact.dateregisteredid is IN( ... the ids from the date dimension table that represent your time period)

These "indexed views" of your data should reside in a seperate database, and a change to an object in production should queue that record for re-indexing in the analytics system. Large sites might batch their records at non-peak times to the stats reporting application always lags behind a few hours or days. I always try to keep it up to the second, if the architecture supports it.

If you are displaying rowcount previews, you might have quite some optimization or caching to implement as well.

Basically to sum it up, you copy data and denormalize. The technique goes by the name "data warehousing" or OLAP (online analytics processing).

There are better ways, using commercial databases like Oracle, but the star schema makes it available to anyone with an open source relational database and some time.

You should definitely read the toolkit but he discusses a lot of things that can save you considerable time. Like strategies for dealing with updated data, and retaining audit history in the reporting application. For every problem he outlines multiple solutions, each of which are applicable in different contexts.

It can scale up to millions of rows if you don't take the easy ways out and use a ton of needless joins.

like image 140
Josh Ribakoff Avatar answered Dec 15 '22 00:12

Josh Ribakoff


You can generate a facet table based on your normalized database tables.
For example:

> SELECT * FROM product_facet
product_id | facet_type | facet_value
1          | color      | blue
2          | color      | blue
3          | color      | green
4          | color      | yellow
1          | speed      | slow
2          | speed      | slow

Then simply do this query to get total per attribute:

SELECT facet_type, facet_value, COUNT(facet_value) as total
FROM product_facet
GROUP BY facet_type, facet_value;

Result:

facet_type | facet_value | total
color      | blue        | 2
color      | green       | 1
color      | yellow      | 1
speed      | slow        | 2

When searching with criteria, you can select the facet table by match product id:

SELECT facet_type, facet_value, COUNT(facet_value) as total
FROM product_facet
WHERE product_id in (SELECT product_id FROM products WHERE ... )
GROUP BY facet_type, facet_value;
like image 22
Stacker Avatar answered Dec 14 '22 23:12

Stacker