Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synthesizing SQL rows within a range

Tags:

sql

sql-server

Using SQL server, I have a table that looks something like the following:

id | time                | measurement
---+---------------------+-------------
1  | 2014-01-01T05:00:00 | 1.0
1  | 2014-01-01T06:45:00 | 2.0
1  | 2014-01-01T09:30:00 | 3.0
1  | 2014-01-01T11:00:00 | NULL
1  | 2014-02-05T03:00:00 | 1.0
1  | 2014-02-05T05:00:00 | NULL

The measurements being stored are presumed to be accurate until a new value is provided for the same id; the last measurement for a given id is the end of the sequence.

I'm interested in creating a query or view that synthesizes new data points on each hour defined by these spans if they don't exist (and the previous point was neither 0 nor NULL), thus:

id | time                | measurement
---+---------------------+-------------
1  | 2014-01-01T05:00:00 | 1.0
1  | 2014-01-01T06:00:00 | 1.0
1  | 2014-01-01T06:45:00 | 2.0
1  | 2014-01-01T07:00:00 | 2.0
1  | 2014-01-01T08:00:00 | 2.0
1  | 2014-01-01T09:00:00 | 2.0
1  | 2014-01-01T09:30:00 | 3.0
1  | 2014-01-01T10:00:00 | 3.0
1  | 2014-02-05T03:00:00 | 1.0
1  | 2014-02-05T04:00:00 | 1.0

Is this feasible?

Would it be more feasible if each input row had a "duration", specifying the amount of time for which its measurement is valid? (In this case, we would be effectively unpacking a run-length encoding in SQL). [My target is SQL Server 2012, which has LEAD and LAG functions, allowing such to be easily constructed].


To provide that data in a format consumable by SQL Server:

select id, cast(stime as datetime) as [time], measurement 
from 
(values
    (1, '2014-01-01T05:00:00', 1.0), 
    (1, '2014-01-01T05:00:00', 1.0), 
    (1, '2014-01-01T06:45:00', 2.0), 
    (1, '2014-01-01T09:30:00', 3.0), 
    (1, '2014-01-01T11:00:00', NULL), 
    (1, '2014-02-05T03:00:00', 1.0), 
    (1, '2014-02-05T05:00:00', NULL)
) t(id, stime, measurement) 
like image 247
Charles Duffy Avatar asked May 22 '16 18:05

Charles Duffy


3 Answers

Its complex, but working (for dataset you provided)

;WITH cte AS (
SELECT *
FROM (VALUES
(1, '2014-01-01T05:00:00', '1.0'),(1, '2014-01-01T06:45:00', '2.0'),
(1, '2014-01-01T09:30:00', '3.0'),(1, '2014-01-01T11:00:00', NULL),
(1, '2014-02-05T03:00:00', '1.0'),(1, '2014-02-05T05:00:00', NULL)
) as t (id, [time], measurement)
)
--Get intervals for every date
, dates AS (
SELECT MIN([time]) [min], DATEADD(hour,-1,MAX([time])) [max]
FROM cte
GROUP BY CAST([time] as date)
)
--Create table with gaps datetimes
, add_dates AS (
SELECT CAST([min] as datetime) as date_
FROM dates
UNION ALL
SELECT DATEADD(hour,1,a.date_)
FROM add_dates a
INNER JOIN dates d 
    ON a.date_ between d.[min] and d.[max]
WHERE a.date_ < d.[max]
)
--Get intervals of datetimes with ids and measurements
, res AS (
SELECT  id,
        [time],
        LEAD([time],1,NULL) OVER (ORDER BY [time])as [time1],
        measurement
FROM cte
)
--Final select
SELECT DISTINCT *
FROM (
    SELECT  r.id,
            a.date_,
            r.measurement
    FROM add_dates a
    LEFT JOIN res r
        ON a.date_ between r.time and r.time1
    WHERE measurement IS NOT NULL
    UNION ALL
    SELECT * 
    FROM cte
    WHERE measurement IS NOT NULL
) as t
ORDER BY t.date_

Output:

id  date_                   measurement
1   2014-01-01 05:00:00.000 1.0
1   2014-01-01 06:00:00.000 1.0
1   2014-01-01 06:45:00.000 2.0
1   2014-01-01 07:00:00.000 2.0
1   2014-01-01 08:00:00.000 2.0
1   2014-01-01 09:00:00.000 2.0
1   2014-01-01 09:30:00.000 3.0
1   2014-01-01 10:00:00.000 3.0
1   2014-02-05 03:00:00.000 1.0
1   2014-02-05 04:00:00.000 1.0

EDIT

First part

If change this part with dates cte to this:

, dates AS (
SELECT DATEADD(hour,DATEPART(hour,MIN([time])),CAST(CAST(MIN([time]) as date) as datetime)) [min], DATEADD(hour,-1,MAX([time])) [max]
FROM cte
GROUP BY CAST([time] as date)
)

This truncates minute and second values from dates.

Second part

And adding partition by id in the LEAD statement keeps different data items from being munged together

, res AS (
SELECT  id,
        [time],
        LEAD([time],1,NULL) OVER (PARTITION BY id ORDER BY [time])as [time1],
        measurement
FROM cte
)

For original dataset output will be the same.

like image 162
gofr1 Avatar answered Oct 31 '22 22:10

gofr1


DECLARE @t TABLE
    (
      id INT ,
      t DATETIME ,
      m MONEY
    )
INSERT  INTO @t
VALUES  ( 1, '2014-01-01T05:00:00', 1.0 ),
        ( 1, '2014-01-01T06:45:00', 2.0 ),
        ( 1, '2014-01-01T09:30:00', 3.0 ),
        ( 1, '2014-01-01T11:00:00', NULL ),
        ( 1, '2014-02-05T03:00:00', 1.0 ),
        ( 1, '2014-02-05T05:00:00', NULL );

WITH 
tal AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n 
       FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(i)
       CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(i)
       CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(i)),

rnk AS(SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY t) AS rn FROM @t),

itr AS(SELECT lr.id, rr.t, DATEADD(mi, 60 - DATEPART(mi, lr.t) , lr.t) AS wt, lr.m
       FROM rnk lr 
       LEFT JOIN rnk rr ON lr.id = rr.id AND lr.rn = rr.rn - 1
       WHERE lr.m IS NOT NULL AND lr.m <> 0)

SELECT * FROM @t WHERE m IS NOT NULL AND m <> 0
UNION ALL
SELECT i.id, DATEADD(hh, t.n, i.wt), i.m
FROM itr i
JOIN tal t ON DATEADD(hh, t.n, i.wt) < i.t
ORDER BY id, t

Breakdown:

1:

tal AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n 
       FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(i)
       CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(i)
       CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(i))

This will return numbers 0, 1, 2, 3, 4, 5 ..... 999. This is approximately 41 days of consecutive intervals of 1 hour. If bigger intervals needed just add more cross joins to it.

2:

rnk AS(SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY t) AS rn FROM @t)

This will rank your rows within id and will return:

id  t                       m       rn
1   2014-01-01 05:00:00.000 1.00    1
1   2014-01-01 06:45:00.000 2.00    2
1   2014-01-01 09:30:00.000 3.00    3
1   2014-01-01 11:00:00.000 NULL    4
1   2014-02-05 03:00:00.000 1.00    5
1   2014-02-05 05:00:00.000 NULL    6

3:

itr AS(SELECT lr.id, rr.t, DATEADD(mi, 60 - DATEPART(mi, lr.t) , lr.t) AS wt, lr.m
       FROM rnk lr 
      LEFT JOIN rnk rr ON lr.id = rr.id AND lr.rn = rr.rn - 1
       WHERE lr.m IS NOT NULL AND lr.m <> 0)

This is the main part. It produces intervals. wt will hold starting hour and t will hold the end of the interval:

id  t                       wt                      m
1   2014-01-01 06:45:00.000 2014-01-01 06:00:00.000 1.00
1   2014-01-01 09:30:00.000 2014-01-01 07:00:00.000 2.00
1   2014-01-01 11:00:00.000 2014-01-01 10:00:00.000 3.00
1   2014-02-05 05:00:00.000 2014-02-05 04:00:00.000 1.00

4:

The last part takes all rows from input table filtering out NULL and 0 values. And unions another set that you get with joining previous intervals on tally table to produce all hours in the interval.

like image 36
Giorgi Nakeuri Avatar answered Nov 01 '22 00:11

Giorgi Nakeuri


Can't give a working example since I don't have a SQL Server env. But yes it is very feasible.

You can do this by joining with a row generator using CTEs. Here is a row generator for dates: https://smehrozalam.wordpress.com/2009/06/09/t-sql-using-common-table-expressions-cte-to-generate-sequences/

something like this

With DateSequence( Date ) as
(
Select '2014-01-01T05:00:00' as Date
    union all
Select dateadd(hour, 1, Date)
    from DateSequence
    where Date < '2014-02-05T05:00:00'
)
Select * from DateSequence option (MaxRecursion 1000)

that will give you a table of all the times you need. Then outer join that to your data table and use the analytic functions to get the last non null value.

like image 1
NP-complete Avatar answered Oct 31 '22 22:10

NP-complete