I need to create a database of guys, guys can have one or more attributes, and each guy's attribute has a specific value, sounds easy eh? well, keep reading as the problem kinda gets impossible (5 days dealing with it :s).
So I create this 3 tables:
CREATE TABLE guy (
id int(11),
name varchar(255)
);
CREATE TABLE attribute (
id int(11),
name varchar(255)
);
-- each value references one guy and one attribute
CREATE TABLE _value (
id int(11),
guy_id int(11),
attribute_id int(11),
_value varchar(255)
);
with this example data:
INSERT INTO attribute VALUES (1, 'age'), (2, 'dollars'), (3, 'candies');
INSERT INTO guy VALUES (1, 'John'), (2, 'Bob');
INSERT INTO _value VALUES (1, 1, 1, 12), (2, 1, 2, 15), (3, 1, 3, 3);
INSERT INTO _value VALUES (4, 2, 1, 15), (5, 2, 2, 20), (6, 2, 3, 6);
and create this query:
SELECT g.name 'guy', a.name 'attribute', v._value 'value'
FROM guy g
JOIN _value v ON g.id = v.guy_id
JOIN attribute a ON a.id = v.attribute_id;
which gives me this result:
+------+-----------+-------+
| guy | attribute | value |
+------+-----------+-------+
| John | age | 12 |
| John | dollars | 15 |
| John | candies | 3 |
| Bob | age | 15 |
| Bob | dollars | 20 |
| Bob | candies | 6 |
+------+-----------+-------+
THIS IS THE REAL PROBLEM:
Later on, my boss tells me he wants to filter data using as many conditions as he wants an be able to groups those conditions with "ands" and "ors", for example, he may want to do this crazy condition:
Get guys which age is greater than 10, have less than 18 dollars, have more than 2 candies and less than 10 candies, but no matter what, also include guys which age is exactly 15. This would translate to this filter:
-- should return both John and Bob
(age > 10 and dollars < 18 and candies > 2 and candies < 10) or (age = 15)
I have no problem creating the filter (I use jqgrid for that), the problem is that attributes aren't columns, but rows instead, and because of that I don't know how to mix the query with the filter, I tried with something like this:
SELECT g.name 'guy', a.name 'attribute', v._value 'value'
FROM guy g
JOIN _value v ON g.id = v.guy_id
JOIN attribute a ON a.id = v.attribute_id
GROUP BY guy
HAVING (
(attribute = 'age' and value > 10) AND
(attribute = 'dollars' and value < 18) AND
(attribute = 'candies' and value > 2) AND
(attribute = 'candies' and value < 10)
)
OR
(
(attribute = 'age' and value = 15)
)
but only Bob is returned :( and I should get both John and Bob.
SO, HOW SHOULD I MIX THE FILTER AND THE QUERY?
Have in mind that the number of attributes each guy has is the same for all guys, but more attributes and more guys can be added anytime, for example, if I want to add the guy 'Mario' I would do:
-- we insert the guy Mario
INSERT INTO guy VALUES (3, 'Mario');
-- with age = 5, dollars = 100 and candies = 1
INSERT INTO _value VALUES (7, 3, 1, 5), (8, 3, 2, 100), (9, 3, 3, 1);
And if I want to create the attribute 'apples' I would do:
-- we insert the attribute apples
INSERT INTO attribute VALUES (4, 'apples');
-- we create a value for each guy's new attribute, John as 7 apples, Bob has 3 and Mario has 8
INSERT INTO _value VALUES (10, 1, 4, 7), (11, 2, 4, 2), (12, 3, 4, 8);
and now I should be able to include conditions about apples in my query.
I hope I made myself understandable, thank you for all your time :)
Note: Maybe If there were a way to put all of each guys attributes in one row?, something like this:
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
| guy | attribute | value | guy | attribute | value | guy | attribute | value | guy | attribute | value |
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
| John | age | 12 | John | dollars | 15 | John | candies | 3 | John | apples | 7 |
| Bob | age | 15 | Bob | dollars | 20 | Bob | candies | 6 | Bob | apples | 2 |
| Mario| age | 5 | Mario| dollars | 100| Mario| candies | 1 | Mario| apples | 8 |
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
Note 2: @iim suggested (In this question: How to search in grouped columns in MySQL? (also in Hibernate if possible)) that I could do a self-join for each attribute, and yes that may solve the problem, but there may be performance problems when guys have tons of attributes (like 30 or more).
Note 3: I cannot change the database schema :(
The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE ).
The MySQL GROUP BY Statement The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
In MySQL, GROUP BY is used for sorting, so the server may also apply ORDER BY optimizations to grouping.
what about something like this?
SELECT g.name 'guy', a.name 'attribute', v._value 'value'
FROM guy g
JOIN _value v1 ON g.id = v1.guy_id
JOIN attribute a1 ON a1.id = v1.attribute_id
JOIN _value v2 ON g.id = v2.guy_id
JOIN attribute a2 ON a2.id = v2.attribute_id
JOIN _value v3 ON g.id = v3.guy_id
JOIN attribute a3 ON a3.id = v3.attribute_id
JOIN _value v4 ON g.id = v4.guy_id
JOIN attribute a4 ON a4.id = v4.attribute_id
JOIN _value v5 ON g.id = v5.guy_id
JOIN attribute a5 ON a5.id = v5.attribute_id
WHERE (
(a1 = 'age' and v1 > 10) AND
(a2 = 'dollars' and v2 < 18) AND
(a3 = 'candies' and v3 > 2) AND
(a4 = 'candies' and v4 < 10)
) OR (a5 = 'age' and v5 = 15)
edit fixing a few dumb errors:
SELECT DISTINCT g.id, g.name 'guy'
FROM guy g
JOIN _value v1 ON g.id = v1.guy_id
JOIN attribute a1 ON a1.id = v1.attribute_id
JOIN _value v2 ON g.id = v2.guy_id
JOIN attribute a2 ON a2.id = v2.attribute_id
JOIN _value v3 ON g.id = v3.guy_id
JOIN attribute a3 ON a3.id = v3.attribute_id
JOIN _value v4 ON g.id = v4.guy_id
JOIN attribute a4 ON a4.id = v4.attribute_id
JOIN _value v5 ON g.id = v5.guy_id
JOIN attribute a5 ON a5.id = v5.attribute_id
WHERE (
(a1.name = 'age' and v1._value > 10) AND
(a2.name = 'dollars' and v2._value < 18) AND
(a3.name = 'candies' and v3._value > 2) AND
(a4.name = 'candies' and v4._value < 10)
) OR (a5.name = 'age' and v5._value = 15)
specifically, i forgot about the fieldnames in the WHERE
clause, select only the 'guy' fields, and added DISTINCT
to get only one row for each guy.
The following will let you make your conditions more or less straightforward, though I can't promise it will be really efficient with 100,000+ guys with 30+ attributes. That you should see for yourself.
SELECT g.name guy, a.name attribute, v._value value
FROM guy g
JOIN _value v ON g.id = v.guy_id
JOIN attribute a ON a.id = v.attribute_id
GROUP BY guy
HAVING (
SUM(a.name = 'age' and v._value > 10) = 1 AND
SUM(a.name = 'dollars' and v._value < 18) = 1 AND
SUM(a.name = 'candies' and v._value > 2 ) = 1 AND
SUM(a.name = 'candies' and v._value < 10) = 1
)
OR
(
SUM(a.name = 'age' and v._value = 15) = 1
)
(I'm assuming here that a guy cannot have duplicate attributes.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With