Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing with GROUP BY, ORDER BY, and GROUP_CONCAT

SOLVED SEE BELOW

I am trying to use both GROUP BY and ORDER BY in my query where I retrieve data sorted by difficulty. I have to use the GROUP BY because of the GROUP CONCAT since some tables such as the 'lookup_peripheral', link multiple values to the same key (content_id). I understand why MYSQL cannot use a index when performing this task since the GROUP BY and ORDER BY statements do not share the same field. However, I am looking for alternative solutions that won't require a day to retrieve the results.

If I omit either the GROUP BY or ORDER BY clause, then the database uses an index, but the results lack either all of the peripherals or are not sorted by difficulty.

I am using the 'lookup_difficulty' table in the FROM so I can use that index in ordering the results. The lookup_xxxxx tables store each allowed value and then the other tables such as peripheral link the submission to the value via the content_id. Everything is referenced to the submission content_id. The content table holds essential info such as member id, name, etc.

I apologize if my post is not clear enough.

mysql> describe peripheral;
+------------------+----------+------+-----+---------+-------+
| Field            | Type     | Null | Key | Default | Extra |
+------------------+----------+------+-----+---------+-------+
| peripheral_id    | int(2)   | NO   | PRI | NULL    |       |
| peripheral       | char(30) | NO   |     | NULL    |       |
| peripheral_total | int(5)   | NO   |     | NULL    |       |
+------------------+----------+------+-----+---------+-------+

mysql> select * from peripheral;
+---------------+-----------------+------------------+
| peripheral_id | peripheral      | peripheral_total |
+---------------+-----------------+------------------+
|             1 | periph 1        |                0 |
|             2 | periph 2        |                1 |
|             3 | periph 3        |                3 |
+---------------+-----------------+------------------+

:

mysql> describe lookup_peripheral;
+---------------+---------+------+------+---------+-------+
| Field         | Type    | Null | Key  | Default | Extra |
+---------------+---------+------+------+---------+-------+
| content_id    | int(10) | NO   | INDEX| NULL    |       |
| peripheral_id | int(2)  | NO   |      | NULL    |       |
+---------------+---------+------+------+---------+-------+  


mysql> mysql> select * from lookup_peripheral;
+------------+---------------+
| content_id | peripheral_id |
+------------+---------------+
|         74 |             2 |
|         74 |             5 |
|         75 |             2 |
|         75 |             5 |
|         76 |             3 |
|         76 |             4 |
+------------+---------------+

The following is not using an index on lookup_difficulty, but rather a table sort and temporary table.

SELECT group_concat(DISTINCT peripheral.peripheral) as peripheral, content.member, .....
FROM (lookup_difficulty)
LEFT OUTER JOIN lookup_peripheral ON lookup_difficulty.content_id = lookup_peripheral.content_id
LEFT OUTER JOIN peripheral ON peripheral.peripheral_id = lookup_peripheral.peripheral_id
.....
LEFT OUTER JOIN programmer ON programmer.programmer_id = lookup_programmer.programmer_id
LEFT OUTER JOIN lookup_programming_language ON lookup_difficulty.content_id = lookup_programming_language.content_id

GROUP BY lookup_difficulty.content_id
ORDER BY lookup_dfficulty.difficulty_id
LIMIT 30    

The ultimate goal is to retrieve results sorted by difficulty with the correct peripherals attached. I think I need a sub-query to achieve this.


EDIT: ANSWER BELOW:

Figured it out. I did what I suscpected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY and SORT BY together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT statement above:

(SELECT group_concat(DISTINCT peripheral.peripheral) as peripheral
FROM lookup_peripheral
LEFT OUTER JOIN peripheral ON peripheral.peripheral_id = lookup_peripheral.peripheral_id
WHERE lookup_difficulty.content_id = lookup_peripheral.content_id
GROUP BY lookup_peripheral.content_id
LIMIT 1) as peripheral

I used a LEFT OUTER since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.

EXPLAIN now gives me a USING INDEX for the lookup_difficulty table. I added this to achieve that:

ALTER TABLE `pictuts`.`lookup_difficulty` DROP PRIMARY KEY ,
ADD PRIMARY KEY ( `difficulty_id` , `content_id` ) 

Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortuatly, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:

FROM 
SELECT lookup_difficulty.content_id, lookup_difficulty.difficulty_id
FROM lookup_difficulty
LIMIT '.$offset.', '.$per_page.'
) ld

Now just add ld.whatever to every JOIN made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...

like image 904
Justin Avatar asked Oct 11 '22 00:10

Justin


1 Answers

Put in Justin's answer, so this question gets off the unanswered list:

Figured it out. I did what I suspected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY and SORT BY together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT statement above:

(SELECT group_concat(DISTINCT p.peripheral) as peripheral
FROM lookup_peripheral lp
LEFT JOIN peripheral p ON p.peripheral_id = lp.peripheral_id
WHERE ld.content_id = lp.content_id
GROUP BY lp.content_id
LIMIT 1) as peripheral

I used a LEFT OUTER since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.

EXPLAIN now gives me a USING INDEX for the lookup_difficulty table. I added this to achieve that:

ALTER TABLE pictuts.lookup_difficulty DROP PRIMARY KEY ,
ADD PRIMARY KEY ( difficulty_id , content_id ) 

Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortunately, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:

FROM 
SELECT ld.content_id, ld.difficulty_id
FROM lookup_difficulty ld
LIMIT '.$per_page.' OFFSET '.$offset.' 
) ld

Now just add ld.whatever to every JOIN made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...

like image 60
Johan Avatar answered Oct 19 '22 19:10

Johan