SQL Find pairs of rows with next best timestamp match

Q: How to select records from the last row in SQL Server?

The TOP clause in SQL Server is used to control the number or percentage of rows from the result. And to select the records from the last, we have to arrange the rows in descending order. We can use the following script to implement the task and achieve the desired result.

Q: How to select 10 records from a table in SQL Server?

In SQL Server, we can easily select the last 10 records from a table by using the “SELECT TOP” statement. The TOP clause in SQL Server is used to control the number or percentage of rows from the result.

Q: How to extract specific piece of information from the timestamp in SQL?

A few functions like EXTRACT in SQL let us extract a specific piece of information from the timestamp. For example, we can extract DAY, MONTH, YEAR, HOUR, MINUTE, SECONDS, etc., from the timestamp. In the following examples, we have tried to extract DAY and MONTH from the timestamp.

Tags:

sql

datetime

postgresql

window-functions

My challenge is to find pairs of rows that are adjacent by timestamp and keep only those pairs with minimal distance of a value field (positive values of the difference)

A table measurement collects data from different sensors with a timestamp and a value.

id | sensor_id | timestamp | value
---+-----------+-----------+------
 1 |         1 | 12:00:00  |     5
 2 |         2 | 12:01:00  |     6
 3 |         1 | 12:02:00  |     4
 4 |         2 | 12:02:00  |     7
 5 |         2 | 12:03:00  |     3
 6 |         1 | 12:05:00  |     3
 7 |         2 | 12:06:00  |     4
 8 |         2 | 12:07:00  |     5
 9 |         1 | 12:08:00  |     6

A sensor's value is valid from its timestamp until the timestamp of its next record (same sensor_id).

Graphical representation

enter image description here

The lower green line shows the distance of sensor 1's (blue line) and sensor 2's (red line) values over time.

My aim is

to combine only those records of 2 sensors that match the timestamp logic (to get the green line)
to find the dinstance local minimums at
- 12:01:00 (at 12:00:00 there's no record for sensor 2)
- 12:05:00
- 12:08:00

The real table resides in a PostgreSQL database and contains about 5 million records of 15 sensors.

Test data

create table measurement (
    id serial,
    sensor_id integer,
    timestamp timestamp,
    value integer)
;

insert into measurement (sensor_id, timestamp, value)
values
(1, '2020-08-16 12:00:00', 5),
(2, '2020-08-16 12:01:00', 6),
(1, '2020-08-16 12:02:00', 4),
(2, '2020-08-16 12:02:00', 7),
(2, '2020-08-16 12:03:00', 3),
(1, '2020-08-16 12:05:00', 3),
(2, '2020-08-16 12:06:00', 4),
(2, '2020-08-16 12:07:00', 5),
(1, '2020-08-16 12:08:00', 6)
;

My approach

was to pick 2 arbitrary sensors (by certain sensor_ids), make a self join and retain for any sensor 1's record only that record of the sensor 2 with the previous timestamp (biggest timestamps of sensor 2 with sensor 1's timestamp <= sensor 2's timestamp).

select
*
from (
    select
    *,
    row_number() over (partition by m1.timestamp order by m2.timestamp desc) rownum
    from measurement m1
    join measurement m2
        on m1.sensor_id <> m2.sensor_id
        and m1.timestamp >= m2.timestamp
    --arbitrarily sensor_ids 1 and 2
    where m1.sensor_id = 1
    and m2.sensor_id = 2
) foo
where rownum = 1

union --vice versa

select
*
from (
    select
    *,
    row_number() over (partition by m2.timestamp order by m1.timestamp desc) rownum
    from measurement m1
    join measurement m2
        on m1.sensor_id <> m2.sensor_id
        and m1.timestamp <= m2.timestamp
    --arbitrarily sensor_ids 1 and 2
    where m1.sensor_id = 1
    and m2.sensor_id = 2
) foo
where rownum = 1
;

But that returns a pair with 12:00:00 where sensor 2 has no data (not a big problem)
and on the real table the statement execution does not end after hours (big problem).

I found certain similar questions but they don't match my problem

SQL Join on Nearest less than date
SQL Join same table based on time stamp and inventory level

Thanks in advance!

393

asked Aug 16 '20 15:08

Daniel

4 Answers

The first step is to calculate the difference at each timestamp. One method uses a lateral join and conditional aggregation:

select t.timestamp,
       max(m.value) filter (where s.sensor_id = 1) as value_1,
       max(m.value) filter (where s.sensor_id = 2) as value_2,
       abs(max(m.value) filter (where s.sensor_id = 2) -
           max(m.value) filter (where s.sensor_id = 1)
          ) as diff
from (values (1), (2)) s(sensor_id) cross join
     (select distinct timestamp
      from measurement
      where sensor_id in (1, 2)
     ) t left join lateral
     (select m.value
      from measurement m 
      where m.sensor_id = s.sensor_id and
            m.timestamp <= t.timestamp
      order by m.timestamp desc
      limit 1 
     ) m
     on 1=1
group by timestamp;

Now the question is when does the difference enter a local minimum. For your sample data, the local minima are all one time unit long. That means that you can use lag() and lead() to find them:

with t as (
      select  t.timestamp,
              max(m.value) filter (where s.sensor_id = 1) as value_1,
              max(m.value) filter (where s.sensor_id = 2) as value_2,
              abs(max(m.value) filter (where s.sensor_id = 2) -
                  max(m.value) filter (where s.sensor_id = 1)
                 ) as diff
      from (values (1), (2)) s(sensor_id) cross join
           (select distinct timestamp
            from measurement
            where sensor_id in (1, 2)
           ) t left join lateral
           (select m.value
            from measurement m 
            where m.sensor_id = s.sensor_id and
                  m.timestamp <= t.timestamp
            order by m.timestamp desc
            limit 1 
           ) m
           on 1=1
      group by timestamp
     )
select *
from (select t.*,
             lag(diff) over (order by timestamp) as prev_diff,
             lead(diff) over (order by timestamp) as next_diff
      from t
     ) t
where (diff < prev_diff or prev_diff is null) and
      (diff < next_diff or next_diff is null);

That might not be a reasonable assumption to make. So, filter out adjacent duplicate values before applying this logic:

select *
from (select t.*,
             lag(diff) over (order by timestamp) as prev_diff,
             lead(diff) over (order by timestamp) as next_diff
      from (select t.*, lag(diff) over (order by timestamp) as test_for_dup
            from t
           ) t
      where test_for_dup is distinct from diff
     ) t
where (diff < prev_diff or prev_diff is null) and
      (diff < next_diff or next_diff is null)

Here is a db<>fiddle.

185

answered Nov 15 '22 04:11

Gordon Linoff

You can use a couple of lateral joins. For example:

with
t as (select distinct timestamp as ts from measurement)
select
  t.ts, s1.value as v1, s2.value as v2,
  abs(s1.value - s2.value) as distance
from t,
lateral (
  select value
  from measurement m 
  where m.sensor_id = 1 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s1,
lateral (
  select value
  from measurement m 
  where m.sensor_id = 2 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s2
order by t.ts

Result:

ts                     v1  v2  distance
---------------------  --  --  --------
2020-08-16 12:01:00.0   5   6         1
2020-08-16 12:02:00.0   4   7         3
2020-08-16 12:03:00.0   4   3         1
2020-08-16 12:05:00.0   3   3         0
2020-08-16 12:06:00.0   3   4         1
2020-08-16 12:07:00.0   3   5         2
2020-08-16 12:08:00.0   6   5         1

See running example at DB Fiddle.

Also, if you want all timestamps, even unmatched ones like 12:00:00, you can do:

with
t as (select distinct timestamp as ts from measurement)
select
  t.ts, s1.value as v1, s2.value as v2,
  abs(s1.value - s2.value) as distance
from t
left join lateral (
  select value
  from measurement m 
  where m.sensor_id = 1 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s1 on true
left join lateral (
  select value
  from measurement m 
  where m.sensor_id = 2 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s2 on true
order by t.ts

In those cases it's not possible to compute the distance, though.

Result:

ts                     v1      v2  distance
---------------------  --  ------  --------
2020-08-16 12:00:00.0   5  <null>    <null>
2020-08-16 12:01:00.0   5       6         1
2020-08-16 12:02:00.0   4       7         3
2020-08-16 12:03:00.0   4       3         1
2020-08-16 12:05:00.0   3       3         0
2020-08-16 12:06:00.0   3       4         1
2020-08-16 12:07:00.0   3       5         2
2020-08-16 12:08:00.0   6       5         1

answered Nov 15 '22 04:11

The Impaler

The infill of missing values requires window functions and a Cartesian product of every minute crossed with your two sensors.

The invars cte accepts the parameters.

with invars as (
  select '2020-08-16 12:00:00'::timestamp as start_ts,
         '2020-08-16 12:08:00'::timestamp as end_ts,
         array[1, 2] as sensor_ids
),

Create the matrix of minute x sensor_id

calendar as (
  select g.minute, s.sensor_id, 
         sensor_ids[1] as sid1,
         sensor_ids[2] as sid2
    from invars i
   cross join generate_series(
           i.start_ts, i.end_ts, interval '1 minute'
         ) as g(minute)
   cross join unnest(i.sensor_ids) as s(sensor_id)
),

Find mgrp for every time a new value is available from a sensor_id

gaps as (
  select c.minute, c.sensor_id, m.value,
         sum(case when m.value is null then 0 else 1 end)
            over (partition by c.sensor_id 
                      order by c.minute) as mgrp,
         c.sid1, c.sid2
    from calendar c
         left join measurement m
                on m.timestamp = c.minute 
               and m.sensor_id = c.sensor_id
),

Interpolate missing values by carrying forward the most recent value

interpolated as (
  select minute, 
         sensor_id,
         coalesce(
           value, first_value(value) over
                    (partition by sensor_id, mgrp
                         order by minute)
         ) as value, sid1, sid2
    from gaps
)

Perform the distance calculation (sum() could have been max() or min()--it makes no difference.

select minute,
       sum(value) filter (where sensor_id = sid1) as value1,
       sum(value) filter (where sensor_id = sid2) as value2, 
       abs(
         sum(value) filter (where sensor_id = sid1) 
         - sum(value) filter (where sensor_id = sid2)
       ) as distance
  from interpolated
 group by minute
 order by minute;

Results:

| minute                   | value1 | value2 | distance |
| ------------------------ | ------ | ------ | -------- |
| 2020-08-16T12:00:00.000Z | 5      |        |          |
| 2020-08-16T12:01:00.000Z | 5      | 6      | 1        |
| 2020-08-16T12:02:00.000Z | 4      | 7      | 3        |
| 2020-08-16T12:03:00.000Z | 4      | 3      | 1        |
| 2020-08-16T12:04:00.000Z | 4      | 3      | 1        |
| 2020-08-16T12:05:00.000Z | 3      | 3      | 0        |
| 2020-08-16T12:06:00.000Z | 3      | 4      | 1        |
| 2020-08-16T12:07:00.000Z | 3      | 5      | 2        |
| 2020-08-16T12:08:00.000Z | 6      | 5      | 1        |

---

[View on DB Fiddle](https://www.db-fiddle.com/f/p65hiAFVT4v3TrjTPbrZnC/0)

Please see this working fiddle.

answered Nov 15 '22 04:11

Mike Organek

Window functions and checking the neigbors. (you'll need an extra anti-selfjoin to remove the duplicates, and invent a tie-breaker for the stable marriage problem)

SELECT id,sensor_id, ztimestamp,value
        -- , prev_ts, next_ts
        , (ztimestamp - prev_ts) AS prev_span
        , (next_ts - ztimestamp) AS next_span
        , (sensor_id <> prev_sensor) AS prev_valid
        , (sensor_id <> next_sensor) AS next_valid
        , CASE WHEN (sensor_id <> prev_sensor AND sensor_id <> next_sensor) THEN
                CASE WHEN (ztimestamp - prev_ts) < (next_ts - ztimestamp) THEN prev_id ELSE next_id END
        WHEN (sensor_id <> prev_sensor) THEN prev_id
        WHEN (sensor_id <> next_sensor) THEN next_id
        ELSE NULL END AS best_neigbor
 FROM (
        SELECT id,sensor_id, ztimestamp,value
        , lag(id) OVER www AS prev_id
        , lead(id) OVER www AS next_id
        , lag(sensor_id) OVER www AS prev_sensor
        , lead(sensor_id) OVER www AS next_sensor
        , lag(ztimestamp) OVER www AS prev_ts
        , lead(ztimestamp) OVER www AS next_ts
        FROM measurement
        WINDOW www AS (order by ztimestamp)
        ) q
ORDER BY ztimestamp,sensor_id
        ;

Result:

DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 9
 id | sensor_id |     ztimestamp      | value | prev_span | next_span | prev_valid | next_valid | best_neigbor 
----+-----------+---------------------+-------+-----------+-----------+------------+------------+--------------
  1 |         1 | 2020-08-16 12:00:00 |     5 |           | 00:01:00  |            | t          |            2
  2 |         2 | 2020-08-16 12:01:00 |     6 | 00:01:00  | 00:01:00  | t          | t          |            3
  3 |         1 | 2020-08-16 12:02:00 |     4 | 00:01:00  | 00:00:00  | t          | t          |            4
  4 |         2 | 2020-08-16 12:02:00 |     7 | 00:00:00  | 00:01:00  | t          | f          |            3
  5 |         2 | 2020-08-16 12:03:00 |     3 | 00:01:00  | 00:02:00  | f          | t          |            6
  6 |         1 | 2020-08-16 12:05:00 |     3 | 00:02:00  | 00:01:00  | t          | t          |            7
  7 |         2 | 2020-08-16 12:06:00 |     4 | 00:01:00  | 00:01:00  | t          | f          |            6
  8 |         2 | 2020-08-16 12:07:00 |     5 | 00:01:00  | 00:01:00  | f          | t          |            9
  9 |         1 | 2020-08-16 12:08:00 |     6 | 00:01:00  |           | t          |            |            8
(9 rows)

answered Nov 15 '22 02:11

wildplasser

Related questions
                            
                                Can't connect to remote SQL Servers outside of network with PYODBC
                            
                                Best way to store Large String in SQL Server database?
                            
                                How to find and replace string in MYSQL Database for a particular string only
                            
                                SQL query - cost of using DISTINCT
                            
                                Why does SQL standard allow duplicate rows?
                            
                                How to save a string containing single quotes to a text column in PostgreSQL
                            
                                Passing in a list of items (that can possibly be null) as params for an IN clause using Dapper
                            
                                Spark SQL grouping: Add to group by or wrap in first() if you don't care which value you get.;
                            
                                Count rows that have the same relationships
                            
                                Recursively concat columns in sql
                            
                                Insert a list of dictionaries into an SQL table using python
                            
                                Batch Entry 0 insert into PGSQL-Call getNextException to see the cause
                            
                                Azure SQL high wait time on "VDI_CLIENT_OTHER"
                            
                                Deterministic string replace and collate
                            
                                How to prevent creating duplicates of edges between the same vertices in OrientDB?
                            
                                which one is effecient, join queries using sql, or merge queries using pandas?
                            
                                Calculate average for each month for a given date range
                            
                                Is the order of conditions in a where clause relevant if both sides are left joined entities?
                            
                                Getting Values From Json Data Inside Array in Mysql
                            
                                Delete rows from SQL server bases on content in dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL Find pairs of rows with next best timestamp match

Tags:

sql

datetime

postgresql

window-functions

Graphical representation

Test data

My approach

Daniel

People also ask

4 Answers

Gordon Linoff

The Impaler

Mike Organek

wildplasser

Recent Activity

Donate For Us