Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize query for multiple columns extracted from the same table

This is a follow up of another question here on SO.

I have this two database tables (more tables omitted):

acquisitions (acq)
    id {PK}
    id_cu {FK}
    datetime
    { Unique Constraint: id_cu - datetime }

data
    id {PK}
    id_acq {FK acquisitions}
    id_meas
    id_elab
    value

Every possible id and datetimeare all indexed.

Granted I will not change db structure I need to extract data this way:

  • rows grouped by datetime
  • each column corresponding data.value for selected acq.id_cu - data.id_meas - data.id_elab combination. (see note on bottom of post)
  • allow empty cells if data is missing for some column but existing for others in a datetime

My current query is built this way (see SO question):

SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1

UNION

SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6

UNION

SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8

) AS T
WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

Here for retrieving just 3 columns but, as I said, columns are frequently more than 50.

It works flawlessly but I'd like to know if it can be optimized in speed.

This is MySQL EXPLAIN EXTENDED for the query above:

+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| id | select_type  | table        | type | possible_keys                                  | key                   | key_len | ref                    | rows  | filtered | Extra                                        |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL  | NULL                                           | NULL                  | NULL    | NULL                   | 82466 |   100.00 | Using where; Using temporary; Using filesort |
|  2 | DERIVED      | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | id_cu                 | 4       |                        | 18011 |   100.00 |                                              |
|  2 | DERIVED      | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
|  3 | UNION        | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | ix_acquisitions_id_cu | 4       |                        | 20864 |   100.00 |                                              |
|  3 | UNION        | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
|  4 | UNION        | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | id_cu                 | 4       |                        | 31848 |   100.00 |                                              |
|  4 | UNION        | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
| NULL | UNION RESULT | <union2,3,4> | ALL  | NULL                                           | NULL                  | NULL    | NULL                   |  NULL |     NULL |                                              |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
8 rows in set, 1 warning (8.24 sec)

Currently with (edit: checked today) 390k acquisitions and 9.2M data values (and growing) it takes about 10 minutes to extract a table of 59 columns. I'm aware the precedent software took up to 1 hour to extract data.

Thanks for your patience reading till here :)


Update

After Denis answer I tried his changes 1. and 2., this is the result of the new query:

SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

UNION ALL

SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

UNION ALL

SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

) AS T GROUP BY datetime

and here the new EXPLAIN EXTENDED:

+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type  | table        | type  | possible_keys                                                | key            | key_len | ref                    | rows  | filtered | Extra                           |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL   | NULL                                                         | NULL           | NULL    | NULL                   | 51997 |   100.00 | Using temporary; Using filesort |
|  2 | DERIVED      | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 14827 |   100.00 | Using where                     |
|  2 | DERIVED      | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  3 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 18663 |   100.00 | Using where                     |
|  3 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  4 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 13260 |   100.00 | Using where                     |
|  4 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
| NULL | UNION RESULT | <union2,3,4> | ALL   | NULL                                                         | NULL           | NULL    | NULL                   |  NULL |     NULL |                                 |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.01 sec)

a good gain in performace with no doubt


Update (2)

This comes adding point 3.

EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

UNION ALL

SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

UNION ALL

SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3 
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime

) AS T GROUP BY datetime;

and this is the result of EXPLAIN EXTENDED

+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type  | table        | type  | possible_keys                                                | key            | key_len | ref                    | rows  | filtered | Extra                           |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL   | NULL                                                         | NULL           | NULL    | NULL                   | 51997 |   100.00 | Using temporary; Using filesort |
|  2 | DERIVED      | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 14827 |   100.00 | Using where                     |
|  2 | DERIVED      | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  3 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 18663 |   100.00 | Using where                     |
|  3 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
|  4 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 13260 |   100.00 | Using where                     |
|  4 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
| NULL | UNION RESULT | <union2,3,4> | ALL   | NULL                                                         | NULL           | NULL    | NULL                   |  NULL |     NULL |                                 |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.06 sec)

Just a little slower, is this supposed to benefit from large number of coulmns? I'll try it...


Update (3)

I tried with and without MAX(data.value)... GROUP BY datetime and, on a 60 columns query I get better results with. Results vary from try to try, this is one of them.

  • original query 9m12.144s
  • with Denis' 1. and 2. 4m6.597s
  • with Denis' 1., 2. and 3. 4m0.210s

It's about 57% less time required.


Update (4)

I tried Andiry solution but it is way slower than Denis optimization.

Tested on a 3 combinations/columns:

  • unoptimized: 1m3s
  • Denis' optimization: 1.7s
  • Andiry's CASE: 9.3s

I also tested on a 12 combinations/columns:

  • unoptimized: untested
  • Denis' optimization: 3.6s
  • Andiry's CASE: 13.7s

Moreover Andiry's solution also pulls in acquisition dates where there's no data for any of the selected combination but existing for others.

Immagine Control Unit 1 gets data every 30 minutes at :00 and :30 while Control Unit 2 at :15 and :45: I'll double the number of rows with NULL filled empty ones.


Note:

It's all about a sensors system: there are several control units (one for each id_cu) with many sensors each.

A single sensor is identified by a id_cu / id_meas couple and sends different elaborations for each measure, say MIN (id_elab=1), MAX (id_elab=2), AVERAGE (id_elab=3), INSTANT (id_elab=...) etc, one for each id_elab.

The user is free to pickup as many elaborations he wants, say:

  • AVERAGE value (3) of sensor #3 of control unit #1 for a results column so id_cu=1 / id_meas=3 / id_elab=3
  • AVERAGE value (3) of sensor #5 of control unit #1 for a results column so id_cu=1 / id_meas=5 / id_elab=3
  • MIN value (1) of sensor #2 of control unit #4 for another column so id_cu=4 / id_meas=2 / id_elab=1
  • (put any valid id_cu, id_meas, id_elab combination)
  • ...

and so on, up to tens of choices...

Here's a partial DDL (irrelevant tables excluded):

CREATE TABLE acquisitions (
    id INTEGER NOT NULL AUTO_INCREMENT, 
    id_cu INTEGER NOT NULL, 
    datetime DATETIME NOT NULL, 
    PRIMARY KEY (id), 
    UNIQUE (id_cu, datetime), 
    FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)

CREATE TABLE data (
    id INTEGER NOT NULL AUTO_INCREMENT, 
    id_acq INTEGER NOT NULL, 
    id_meas INTEGER NOT NULL, 
    id_elab INTEGER NOT NULL, 
    value FLOAT, 
    PRIMARY KEY (id), 
    FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
)

CREATE TABLE ctrl_units (
    id INTEGER NOT NULL, 
    name VARCHAR(40) NOT NULL, 
    PRIMARY KEY (id)
)

CREATE TABLE sensors (
    id_cu INTEGER NOT NULL, 
    id_meas INTEGER NOT NULL, 
    id_elab INTEGER NOT NULL, 
    name VARCHAR(40) NOT NULL, 
    `desc` VARCHAR(80), 
    PRIMARY KEY (id_cu, id_meas), 
    FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)

1 Answers

There are three main issues:

  1. Use union all, rather than union. you're grouping and fetching min/max values, so there's no point in introducing a sort step to drop duplicate rows.

  2. The where clause could be placed in each of the union substatements:

    select ...
    from (
    select ... from ...  where ...
    union all
    select ... from ...  where ...
    union all
    ...
    )
    group by ...
    

    The way you've written it, it starts by fetching all rows, appends them all, and finally filters the ones you need. Injecting the where clause in the union substatements will make it fetch only the rows you need, finally append them all.

  3. Along the same lines, pre-aggregate the aggregates:

    select ..., max(foo) as foo
    from (
    select ..., max(foo) as foo from ...  where ... group by ...
    union all
    select ..., max(foo) as foo from ...  where ... group by ...
    union all
    ...
    )
    group by ...
    

    The optimizer will make better use of existing indexes, and you'll end up appending only a few rows, rather than millions.

like image 152
Denis de Bernardy Avatar answered Jun 06 '26 05:06

Denis de Bernardy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!