Using SQL server, I have a table that looks something like the following:
id | time | measurement
---+---------------------+-------------
1 | 2014-01-01T05:00:00 | 1.0
1 | 2014-01-01T06:45:00 | 2.0
1 | 2014-01-01T09:30:00 | 3.0
1 | 2014-01-01T11:00:00 | NULL
1 | 2014-02-05T03:00:00 | 1.0
1 | 2014-02-05T05:00:00 | NULL
The measurements being stored are presumed to be accurate until a new value is provided for the same id; the last measurement for a given id is the end of the sequence.
I'm interested in creating a query or view that synthesizes new data points on each hour defined by these spans if they don't exist (and the previous point was neither 0 nor NULL), thus:
id | time | measurement
---+---------------------+-------------
1 | 2014-01-01T05:00:00 | 1.0
1 | 2014-01-01T06:00:00 | 1.0
1 | 2014-01-01T06:45:00 | 2.0
1 | 2014-01-01T07:00:00 | 2.0
1 | 2014-01-01T08:00:00 | 2.0
1 | 2014-01-01T09:00:00 | 2.0
1 | 2014-01-01T09:30:00 | 3.0
1 | 2014-01-01T10:00:00 | 3.0
1 | 2014-02-05T03:00:00 | 1.0
1 | 2014-02-05T04:00:00 | 1.0
Is this feasible?
Would it be more feasible if each input row had a "duration", specifying the amount of time for which its measurement is valid? (In this case, we would be effectively unpacking a run-length encoding in SQL). [My target is SQL Server 2012, which has LEAD and LAG functions, allowing such to be easily constructed].
To provide that data in a format consumable by SQL Server:
select id, cast(stime as datetime) as [time], measurement
from
(values
(1, '2014-01-01T05:00:00', 1.0),
(1, '2014-01-01T05:00:00', 1.0),
(1, '2014-01-01T06:45:00', 2.0),
(1, '2014-01-01T09:30:00', 3.0),
(1, '2014-01-01T11:00:00', NULL),
(1, '2014-02-05T03:00:00', 1.0),
(1, '2014-02-05T05:00:00', NULL)
) t(id, stime, measurement)
Its complex, but working (for dataset you provided)
;WITH cte AS (
SELECT *
FROM (VALUES
(1, '2014-01-01T05:00:00', '1.0'),(1, '2014-01-01T06:45:00', '2.0'),
(1, '2014-01-01T09:30:00', '3.0'),(1, '2014-01-01T11:00:00', NULL),
(1, '2014-02-05T03:00:00', '1.0'),(1, '2014-02-05T05:00:00', NULL)
) as t (id, [time], measurement)
)
--Get intervals for every date
, dates AS (
SELECT MIN([time]) [min], DATEADD(hour,-1,MAX([time])) [max]
FROM cte
GROUP BY CAST([time] as date)
)
--Create table with gaps datetimes
, add_dates AS (
SELECT CAST([min] as datetime) as date_
FROM dates
UNION ALL
SELECT DATEADD(hour,1,a.date_)
FROM add_dates a
INNER JOIN dates d
ON a.date_ between d.[min] and d.[max]
WHERE a.date_ < d.[max]
)
--Get intervals of datetimes with ids and measurements
, res AS (
SELECT id,
[time],
LEAD([time],1,NULL) OVER (ORDER BY [time])as [time1],
measurement
FROM cte
)
--Final select
SELECT DISTINCT *
FROM (
SELECT r.id,
a.date_,
r.measurement
FROM add_dates a
LEFT JOIN res r
ON a.date_ between r.time and r.time1
WHERE measurement IS NOT NULL
UNION ALL
SELECT *
FROM cte
WHERE measurement IS NOT NULL
) as t
ORDER BY t.date_
Output:
id date_ measurement
1 2014-01-01 05:00:00.000 1.0
1 2014-01-01 06:00:00.000 1.0
1 2014-01-01 06:45:00.000 2.0
1 2014-01-01 07:00:00.000 2.0
1 2014-01-01 08:00:00.000 2.0
1 2014-01-01 09:00:00.000 2.0
1 2014-01-01 09:30:00.000 3.0
1 2014-01-01 10:00:00.000 3.0
1 2014-02-05 03:00:00.000 1.0
1 2014-02-05 04:00:00.000 1.0
EDIT
First part
If change this part with dates
cte to this:
, dates AS (
SELECT DATEADD(hour,DATEPART(hour,MIN([time])),CAST(CAST(MIN([time]) as date) as datetime)) [min], DATEADD(hour,-1,MAX([time])) [max]
FROM cte
GROUP BY CAST([time] as date)
)
This truncates minute and second values from dates.
Second part
And adding
partition by id
in theLEAD
statement keeps different data items from being munged together
, res AS (
SELECT id,
[time],
LEAD([time],1,NULL) OVER (PARTITION BY id ORDER BY [time])as [time1],
measurement
FROM cte
)
For original dataset output will be the same.
DECLARE @t TABLE
(
id INT ,
t DATETIME ,
m MONEY
)
INSERT INTO @t
VALUES ( 1, '2014-01-01T05:00:00', 1.0 ),
( 1, '2014-01-01T06:45:00', 2.0 ),
( 1, '2014-01-01T09:30:00', 3.0 ),
( 1, '2014-01-01T11:00:00', NULL ),
( 1, '2014-02-05T03:00:00', 1.0 ),
( 1, '2014-02-05T05:00:00', NULL );
WITH
tal AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(i)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(i)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(i)),
rnk AS(SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY t) AS rn FROM @t),
itr AS(SELECT lr.id, rr.t, DATEADD(mi, 60 - DATEPART(mi, lr.t) , lr.t) AS wt, lr.m
FROM rnk lr
LEFT JOIN rnk rr ON lr.id = rr.id AND lr.rn = rr.rn - 1
WHERE lr.m IS NOT NULL AND lr.m <> 0)
SELECT * FROM @t WHERE m IS NOT NULL AND m <> 0
UNION ALL
SELECT i.id, DATEADD(hh, t.n, i.wt), i.m
FROM itr i
JOIN tal t ON DATEADD(hh, t.n, i.wt) < i.t
ORDER BY id, t
Breakdown:
1:
tal AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(i)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(i)
CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(i))
This will return numbers 0, 1, 2, 3, 4, 5 ..... 999
. This is approximately 41 days of consecutive intervals of 1 hour. If bigger intervals needed just add more cross joins to it.
2:
rnk AS(SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY t) AS rn FROM @t)
This will rank your rows within id
and will return:
id t m rn
1 2014-01-01 05:00:00.000 1.00 1
1 2014-01-01 06:45:00.000 2.00 2
1 2014-01-01 09:30:00.000 3.00 3
1 2014-01-01 11:00:00.000 NULL 4
1 2014-02-05 03:00:00.000 1.00 5
1 2014-02-05 05:00:00.000 NULL 6
3:
itr AS(SELECT lr.id, rr.t, DATEADD(mi, 60 - DATEPART(mi, lr.t) , lr.t) AS wt, lr.m
FROM rnk lr
LEFT JOIN rnk rr ON lr.id = rr.id AND lr.rn = rr.rn - 1
WHERE lr.m IS NOT NULL AND lr.m <> 0)
This is the main part. It produces intervals. wt
will hold starting hour and t
will hold the end of the interval:
id t wt m
1 2014-01-01 06:45:00.000 2014-01-01 06:00:00.000 1.00
1 2014-01-01 09:30:00.000 2014-01-01 07:00:00.000 2.00
1 2014-01-01 11:00:00.000 2014-01-01 10:00:00.000 3.00
1 2014-02-05 05:00:00.000 2014-02-05 04:00:00.000 1.00
4:
The last part takes all rows from input table filtering out NULL
and 0
values. And unions another set that you get with joining previous intervals on tally table to produce all hours in the interval.
Can't give a working example since I don't have a SQL Server env. But yes it is very feasible.
You can do this by joining with a row generator using CTEs. Here is a row generator for dates: https://smehrozalam.wordpress.com/2009/06/09/t-sql-using-common-table-expressions-cte-to-generate-sequences/
something like this
With DateSequence( Date ) as
(
Select '2014-01-01T05:00:00' as Date
union all
Select dateadd(hour, 1, Date)
from DateSequence
where Date < '2014-02-05T05:00:00'
)
Select * from DateSequence option (MaxRecursion 1000)
that will give you a table of all the times you need. Then outer join that to your data table and use the analytic functions to get the last non null value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With