Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement filter system in SQL?

Tags:

sql

mysql

Right now I am planning to add a filter system to my site.

Examples:

(ID=apple, COLOR=red, TASTE=sweet, ORIGIN=US)
(ID=mango, COLOR=yellow, TASTE=sweet, ORIGIN=MEXICO)
(ID=banana, COLOR=yellow, TASTE=bitter-sweet, ORIGIN=US)

so now I am interested in doing the following: SELECT ID FROM thisTable WHERE COLOR='yellow' AND TASTE='SWEET'

But my problem is I am doing this for multiple categories in my site, and the columns are NOT consistent. (like if the table is for handphones, then it will be BRAND, 3G-ENABLED, PRICE, COLOR, WAVELENGTH, etc)

how could I design a general schema that allows this?

Right now I am planning on doing:

table(ID, KEY, VALUE)

This allows arbitary number of columns, but for the query, I am using SELECT ID FROM table WHERE (KEY=X1 AND VALUE=V1) AND (KEY=X2 AND VALUE=V2), .. which returns an empty set.

Can someone recommend a good solution to this? Note that the number of columns WILL change regularly

like image 313
crapbag Avatar asked May 04 '10 04:05

crapbag


2 Answers

The entity-attribute-value model that you suggest could fit in this scenario.

Regarding the filtering query, you have to understand that with the EAV model you will sacrifice plenty of query power, so this can become quite tricky. However this one way to tackle your problem:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches
           FROM      table
           WHERE     (`key` = X1 AND `value` = V1) OR 
                     (`key` = X2 AND `value` = V2) 
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

One inelegant feature of this approach is that you need to specify the number of attribute/value pairs that you expect to match in sub_t.matches = 2. If we had three conditions we would have had to specify sub_t.matches = 3, and so on.

Let's build a test case:

CREATE TABLE stuff (`id` varchar(20), `key` varchar(20), `value` varchar(20));

INSERT INTO stuff VALUES ('apple',  'color',  'red');
INSERT INTO stuff VALUES ('mango',  'color',  'yellow');
INSERT INTO stuff VALUES ('banana', 'color',  'yellow');

INSERT INTO stuff VALUES ('apple',  'taste',  'sweet');
INSERT INTO stuff VALUES ('mango',  'taste',  'sweet');
INSERT INTO stuff VALUES ('banana', 'taste',  'bitter-sweet');

INSERT INTO stuff VALUES ('apple',  'origin',  'US');
INSERT INTO stuff VALUES ('mango',  'origin',  'MEXICO');
INSERT INTO stuff VALUES ('banana', 'origin',  'US');

Query:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.02 sec)

Now let's insert another fruit with color=yellow and taste=sweet:

INSERT INTO stuff VALUES ('pear', 'color', 'yellow');
INSERT INTO stuff VALUES ('pear', 'taste', 'sweet');
INSERT INTO stuff VALUES ('pear', 'origin', 'somewhere');

The same query would return:

+-------+
| id    |
+-------+
| mango |
| pear  |
+-------+
2 rows in set (0.00 sec)

If we want to restrict this result to entities with origin=MEXICO, we would have to add another OR condition and check for sub_t.matches = 3 instead of 2.

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet') OR 
                     (`key` = 'origin' AND `value` = 'MEXICO')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 3 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.00 sec)

As in every approach, there are certain advantages and disadvantages when using the EAV model. Make sure you research the topic extensively in the context of your application. You may even want to consider an alternative relational databases, such as Cassandra, CouchDB, MongoDB, Voldemort, HBase, SimpleDB or other key-value stores.

like image 156
Daniel Vassallo Avatar answered Oct 02 '22 18:10

Daniel Vassallo


The following worked for me:

SELECT * FROM mytable t WHERE 
    t.key = "key" AND t.value = "value" OR
    t.key = "key" AND t.value = "value" OR
    ....
    t.key = "key" AND t.value = "value"
GROUP BY t.id having count(*)=3;

count(*)=3 must match the amount of

t.key = "key" AND t.value = "value"

cases

like image 31
Tarlog Avatar answered Oct 02 '22 16:10

Tarlog