Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a faceted search with SQL Server

I have an application which I will be accessing SQL server to return data which has been filtered by selections from the application as any common faceted search. I did see some out the box solutions, but these are expensive and I prefer building out something custom, but just don't know where to start.

The database structure is like this: enter image description here

The data from the PRODUCT table would be searched by tags from the TAG table. Values which would be found in the TAG table would be something like this:

 ID      NAME
 ----------------------
 1       Blue
 2       Green
 3       Small
 4       Large
 5       Red

They would be related to products through the ProductTag table.

I would need to return two groups of data from this setup:

  1. The Products that are only related to the Tags selected, whether single or multiple
  2. The Remaining tags that are also available to select for the products which have already been refined by single or multiple selected tags.

I would like this to be all with-in SQL server if possible, 2 seperate as stored procedures.

Most websites have this feature built into it these days, ie: http://www.gnc.com/family/index.jsp?categoryId=2108294&cp=3593186.3593187 (They've called it 'Narrow By')

I have been searching for a while how to do this, and I'm taking a wild guess that if a stored procedure has to be created in this nature, that there would need to be 1 param that accepts CSV values, like this:

 [dbo].[GetFacetedProducts] @Tags_Selected = '1,3,5'
 [dbo].[GetFacetedTags] @Tags_Selected = '1,3,5'

So with this architecture, does anyone know what types of queries need to be written for these stored procedures, or is the architecture flawed in any way? Has anyone created a faceted search before that was like this? If so, what types of queries would be needed to make something like this? I guess I'm just having trouble wrap my head around it, and there isn't much out there that shows someone how to make something like this.

like image 722
Control Freak Avatar asked Aug 21 '13 01:08

Control Freak


People also ask

What is a database faceted search?

Faceted search, or faceted navigation, is a way of browsing and searching for items in a set of data by applying filters on various properties (facets) of the items in the collection.

How do you do multiple search in SQL?

Yes, you can use SQL IN operator to search multiple absolute values: SELECT name FROM products WHERE name IN ( 'Value1', 'Value2', ... );


2 Answers

A RDBMS for being used for faceted searching is the wrong tool for the job at hand. Faceted searching is a multidimensional search, which is difficult to express in the set-based SQL language. Using a data-cube or the like might give you some of the desired functionality, but would be quite a bit of work to build.

When we were faced with similar requirements we ultimately decided to utilize the Apache Solr search engine, which supports faceting as well as many other search-oriented functions and features.

like image 63
STW Avatar answered Sep 19 '22 17:09

STW


It is possible to do faceted search in SQL Server. However don't try to use your live product data tables. Instead create a de-normalised "fact" table which holds every product (rows) and every tag (columns) so that the intersection is your product-tag values. You can re-populate this periodically from your main product table.

It is then straightforward and relatively efficient to get the facet counts for the matching records for each tag the user checks.

The approach I have described will be perfectly good for small cases, e.g. 1,000 product rows and 50-100 tags (attributes). Also there is an interesting opportunity with the forthcoming SQL Server 2014, which can place tables in memory - that should allow much larger fact tables.

I have also used Solr, and as STW points out this is the "correct" tool for facet searches. It is orders of magnitude faster than a SQL Server solution.

However there are some major disadvantages to using Solr. The main issue is that you have to setup not only another platform (Solr) but also all the paraphernalia that goes with it - Java and some kind of Java servlet (of which there are several). And whilst Solr runs on Windows quite nicely, you will still soon find yourself immersed in a world of command lines and editing of configuration files and environment variables that will remind you of all that was great about the 1980s ... or possibly not. And when that is all working you then need to export your product data to it, using various methods - there is a SQL Server connector which works fairly well but many prefer to post data to it as XML. And then you have to create a webservice-type process on your application to send it the user's query and parse the resulting list of matches and counts back into your application (again, XML is probably the best method).

So if your dataset is relatively small, I would stick with SQL Server. You can still get a sub-second response, and SQL 2014 will hopefully allow much bigger datasets. If your dataset is big then Solr will give remarkably fast results (it really is very fast) but be prepared to make a major investment in learning and supporting a whole new platform.

like image 34
Amarsys Avatar answered Sep 21 '22 17:09

Amarsys