Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search in grouped columns in MySQL?

Tags:

sql

mysql

I need to create a database of guys, guys can have one or more attributes, and each guy's attribute has a specific value, sounds easy eh? well, keep reading as the problem kinda gets impossible (5 days dealing with it :s).

So I create this 3 tables:

CREATE TABLE guy (
  id int(11),
  name varchar(255)
);

CREATE TABLE attribute (
  id int(11),
  name varchar(255)
);

-- each value references one guy and one attribute
CREATE TABLE _value (
  id int(11),
  guy_id int(11),
  attribute_id int(11),
  _value varchar(255)
);

with this example data:

INSERT INTO attribute VALUES (1, 'age'), (2, 'dollars'), (3, 'candies');
INSERT INTO guy VALUES (1, 'John'), (2, 'Bob');
INSERT INTO _value VALUES (1, 1, 1, 12), (2, 1, 2, 15), (3, 1, 3, 3);
INSERT INTO _value VALUES (4, 2, 1, 15), (5, 2, 2, 20), (6, 2, 3, 6);

and create this query:

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id;

which gives me this result:

+------+-----------+-------+
| guy  | attribute | value |
+------+-----------+-------+
| John | age       | 12    |
| John | dollars   | 15    |
| John | candies   | 3     |
| Bob  | age       | 15    |
| Bob  | dollars   | 20    |
| Bob  | candies   | 6     |
+------+-----------+-------+

THIS IS THE REAL PROBLEM:

Later on, my boss tells me he wants to filter data using as many conditions as he wants an be able to groups those conditions with "ands" and "ors", for example, he may want to do this crazy condition:

Get guys which age is greater than 10, have less than 18 dollars, have more than 2 candies and less than 10 candies, but no matter what, also include guys which age is exactly 15. This would translate to this filter:

-- should return both John and Bob
(age > 10 and dollars < 18 and candies > 2 and candies < 10) or (age = 15)

I have no problem creating the filter (I use jqgrid for that), the problem is that attributes aren't columns, but rows instead, and because of that I don't know how to mix the query with the filter, I tried with something like this:

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id
GROUP BY guy
HAVING (
    (attribute = 'age' and value > 10) AND
    (attribute = 'dollars' and value < 18) AND
    (attribute = 'candies' and value > 2) AND
    (attribute = 'candies' and value < 10)
       )
OR
       (
     (attribute = 'age' and value = 15)
       )

but only Bob is returned :( and I should get both John and Bob.

SO, HOW SHOULD I MIX THE FILTER AND THE QUERY?

Have in mind that the number of attributes each guy has is the same for all guys, but more attributes and more guys can be added anytime, for example, if I want to add the guy 'Mario' I would do:

-- we insert the guy Mario
INSERT INTO guy VALUES (3, 'Mario');
-- with age = 5, dollars = 100 and candies = 1
INSERT INTO _value VALUES (7, 3, 1, 5), (8, 3, 2, 100), (9, 3, 3, 1);

And if I want to create the attribute 'apples' I would do:

-- we insert the attribute apples
INSERT INTO attribute VALUES (4, 'apples');
-- we create a value for each guy's new attribute, John as 7 apples, Bob has 3 and Mario has 8
INSERT INTO _value VALUES (10, 1, 4, 7), (11, 2, 4, 2), (12, 3, 4, 8);

and now I should be able to include conditions about apples in my query.

I hope I made myself understandable, thank you for all your time :)

Note: Maybe If there were a way to put all of each guys attributes in one row?, something like this:

+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
| guy  | attribute | value | guy  | attribute  | value  | guy  | attribute  | value  | guy  | attribute  | value  |
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
| John | age       |    12 | John | dollars    |     15 | John | candies    |      3 | John | apples     |      7 |
| Bob  | age       |    15 | Bob  | dollars    |     20 | Bob  | candies    |      6 | Bob  | apples     |      2 |
| Mario| age       |    5  | Mario| dollars    |     100| Mario| candies    |      1 | Mario| apples     |      8 |
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+

Note 2: @iim suggested (In this question: How to search in grouped columns in MySQL? (also in Hibernate if possible)) that I could do a self-join for each attribute, and yes that may solve the problem, but there may be performance problems when guys have tons of attributes (like 30 or more).

Note 3: I cannot change the database schema :(

like image 457
ilovelamp Avatar asked Oct 19 '11 00:10

ilovelamp


People also ask

Does GROUP BY use index MySQL?

The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE ).

How do I group columns in MySQL?

The MySQL GROUP BY Statement The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

What is Nonaggregated column in MySQL?

If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.

Does GROUP BY sort data in MySQL?

In MySQL, GROUP BY is used for sorting, so the server may also apply ORDER BY optimizations to grouping.


2 Answers

what about something like this?

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v1 ON g.id = v1.guy_id 
  JOIN attribute a1 ON a1.id = v1.attribute_id
JOIN _value v2 ON g.id = v2.guy_id 
  JOIN attribute a2 ON a2.id = v2.attribute_id
JOIN _value v3 ON g.id = v3.guy_id 
  JOIN attribute a3 ON a3.id = v3.attribute_id
JOIN _value v4 ON g.id = v4.guy_id 
  JOIN attribute a4 ON a4.id = v4.attribute_id
JOIN _value v5 ON g.id = v5.guy_id 
  JOIN attribute a5 ON a5.id = v5.attribute_id
WHERE (
    (a1 = 'age' and v1 > 10) AND
    (a2 = 'dollars' and v2 < 18) AND
    (a3 = 'candies' and v3 > 2) AND
    (a4 = 'candies' and v4 < 10)
  ) OR (a5 = 'age' and v5 = 15)

edit fixing a few dumb errors:

SELECT DISTINCT g.id, g.name 'guy'
FROM guy g 
JOIN _value v1 ON g.id = v1.guy_id 
  JOIN attribute a1 ON a1.id = v1.attribute_id
JOIN _value v2 ON g.id = v2.guy_id 
  JOIN attribute a2 ON a2.id = v2.attribute_id
JOIN _value v3 ON g.id = v3.guy_id 
  JOIN attribute a3 ON a3.id = v3.attribute_id
JOIN _value v4 ON g.id = v4.guy_id 
  JOIN attribute a4 ON a4.id = v4.attribute_id
JOIN _value v5 ON g.id = v5.guy_id 
  JOIN attribute a5 ON a5.id = v5.attribute_id
WHERE (
    (a1.name = 'age' and v1._value > 10) AND
    (a2.name = 'dollars' and v2._value < 18) AND
    (a3.name = 'candies' and v3._value > 2) AND
    (a4.name = 'candies' and v4._value < 10)
  ) OR (a5.name = 'age' and v5._value = 15)

specifically, i forgot about the fieldnames in the WHERE clause, select only the 'guy' fields, and added DISTINCT to get only one row for each guy.

like image 192
Javier Avatar answered Sep 29 '22 20:09

Javier


The following will let you make your conditions more or less straightforward, though I can't promise it will be really efficient with 100,000+ guys with 30+ attributes. That you should see for yourself.

SELECT g.name guy, a.name attribute, v._value value
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id
GROUP BY guy
HAVING (
    SUM(a.name = 'age'     and v._value > 10) = 1 AND
    SUM(a.name = 'dollars' and v._value < 18) = 1 AND
    SUM(a.name = 'candies' and v._value > 2 ) = 1 AND
    SUM(a.name = 'candies' and v._value < 10) = 1
       )
OR
       (
    SUM(a.name = 'age'     and v._value = 15) = 1
       )

(I'm assuming here that a guy cannot have duplicate attributes.)

like image 35
Andriy M Avatar answered Sep 29 '22 18:09

Andriy M