Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently determine changes between rows using SQL

I have a very large MySQL table containing data read from a number of sensors. Essentially, there's a time stamp and a value column. I'll omit the sensor id, indexes other details here:

CREATE TABLE `data` (
  `time` datetime NOT NULL,
  `value` float NOT NULL
)

The value column rarely changes, and I need to find the points in time when those changes occur. Suppose there's a value every minute, the following query returns exactly what I need:

SELECT d.*, 
  (SELECT value FROM data WHERE time<d.time ORDER by time DESC limit 1) 
    AS previous_value 
FROM data d 
HAVING d.value<>previous_value OR previous_value IS NULL;

+---------------------+-------+----------------+
| time                | value | previous_value |
+---------------------+-------+----------------+
| 2011-05-23 16:05:00 |     1 |           NULL |
| 2011-05-23 16:09:00 |     2 |              1 |
| 2011-05-23 16:11:00 |   2.5 |              2 |
+---------------------+-------+----------------+

The only problem is that this is very inefficient, mostly due to the dependent subquery. What would be the best way to optimize this using the tools that MySQL 5.1 has to offer?

One last constraint is that the values are not ordered before they are inserted into the data table and that they might be updated at a later point. This might affect any possible de-normalization strategies.

like image 625
cg. Avatar asked May 24 '11 11:05

cg.


People also ask

How can I find the difference between two rows in SQL?

To calculate a difference, you need a pair of records; those two records are “the current record” and “the previous year's record”. You obtain this record using the LAG() window function. This function allows you to obtain data from the previous record (based on an order criterion, which here is “ ORDER BY year ”).

How do I compare two consecutive rows in SQL?

Here's the SQL query to compare each row with previous row. In the above query, we join sales table with itself using an INNER JOIN condition g2.id=g1.id + 1 that allows you to compare each row with its previous row. Please note, this condition depends on the fact that our id column has consecutive numbers.

How can you tell if a table changes in SQL?

Inspect the "TextData" column for the events: CREATE DATABASE, DROP DATABASE, ALTER DATABASE in order to know what database was changed. To find out who changed it and when it was changed, refer to the "LoginName" and "StartTime" columns respectively.


2 Answers

You might try this - I'm not going to guarantee that it will perform better, but it's my usual way of correlating a row with a "previous" row:

SELECT
    * --TODO, list columns
FROM
    data d
       left join
    data d_prev
       on
           d_prev.time < d.time --TODO - Other key columns?
       left join
    data d_inter
       on
           d_inter.time < d.time and
           d_prev.time < d_inter.time --TODO - Other key columns?
WHERE
    d_inter.time is null AND
    (d_prev.value is null OR d_prev.value <> d.value)

(I think this is right - could do with some sample data to validate it).

Basically, the idea is to join the table to itself, and for each row (in d), find candidate rows (in d_prev) for the "previous" row. Then do a further join, to try to find a row (in d_inter) that exists between the current row (in d) and the candidate row (in d_prev). If we cannot find such a row (d_inter.time is null), then that candidate was indeed the previous row.

like image 156
Damien_The_Unbeliever Avatar answered Sep 29 '22 00:09

Damien_The_Unbeliever


I suppose it's not an option for you to switch DB engine. In case it might be, then window functions would allow you to write things like this:

SELECT d.*
FROM (
    SELECT d.*, lag(d.value) OVER (ORDER BY d.time) as previous_value 
    FROM data d
  ) as d
WHERE d.value IS DISTINCT FROM d.previous_value;

If not, you could try to rewrite the query like so:

select data.*
from data
left join (
    select data.measure_id,
           data.time,
           max(prev_data) as prev_time
    from data
    left join data as prev_data
    on prev_data.time < data.time
    group by data.measure_id, data.time, data.value
    ) as prev_data_time
on prev_data_time.measure_id = data.measure_id
and prev_data_time.time = data.time
left join prev_data_value
on prev_data_value.measure_id = data.measure_id
and prev_data_value.time = prev_data_time.prev_time
where data.value <> prev_data_value.value or prev_data_value.value is null
like image 41
Denis de Bernardy Avatar answered Sep 29 '22 00:09

Denis de Bernardy