I have a database table containing one-minute periods of Open, Close, High, Low, Volume values for a security. I'm using SQL Server 2017, but 2019 RC is an option.
I am trying to find an efficient SQL Server query that can aggregate these into 5-minute windows, where:
Ideally this query would account for gaps in the data, i.e. be based on date calculations rather than counting preceding / following rows.
For example say I have (here's 6 mins of data):
| Time | Open | Close | High | Low | Volume | |------------------|------|-------|------|-----|--------| | 2019-10-30 09:30 | 5 | 10 | 15 | 1 | 125000 | | 2019-10-30 09:31 | 10 | 15 | 20 | 5 | 100000 | | 2019-10-30 09:32 | 15 | 20 | 25 | 10 | 120000 | | 2019-10-30 09:33 | 20 | 25 | 30 | 15 | 10000 | | 2019-10-30 09:34 | 20 | 22 | 40 | 2 | 13122 | | 2019-10-30 09:35 | 22 | 30 | 35 | 4 | 15000 | Not factored in, since this would be the first row of the next 5-minute window
I am trying to write a query that would give me (here's the first example of the 5-minute aggregate):
| Time | Open | Close | High | Low | Volume | |------------------|------|-------|------|-----|---------| | 2019-10-30 09:30 | 5 | 30 | 40 | 1 | 50224.4 |
Any tips? Am banging my head against the wall with the OVER clause and its PARTITION / RANGE options
The gist of the problem is rounding datetime values to 5 minute boundary which (assuming that the datatype is datetime
) could be done using DATEADD(MINUTE, DATEDIFF(MINUTE, 0, time) / 5 * 5, 0)
. Rest is basic grouping/window functions:
WITH cte AS (
SELECT clamped_time
, [Open]
, [Close]
, [High]
, [Low]
, [Volume]
, rn1 = ROW_NUMBER() OVER (PARTITION BY clamped_time ORDER BY [Time])
, rn2 = ROW_NUMBER() OVER (PARTITION BY clamped_time ORDER BY [Time] DESC)
FROM t
CROSS APPLY (
SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, time) / 5 * 5, 0)
) AS x(clamped_time)
)
SELECT clamped_time
, MIN(CASE WHEN rn1 = 1 THEN [Open] END) AS [Open]
, MIN(CASE WHEN rn2 = 1 THEN [Close] END) AS [Close]
, MAX([High]) AS [High]
, MIN([Low]) AS [Low]
, AVG([Volume])
FROM cte
GROUP BY clamped_time
Demo on db<>fiddle
You want to analyze data by 5 minutes intervals. You could use window functions with the following partitioning clause:
partition by datepart(year, t.[time]),
datepart(month, t.[time]),
datepart(day, t.[time]),
datepart(hour, t.[time]),
(datepart(minute, t.[time]) / 5)
Query:
select *
from (
select
t.time,
row_number() over(
partition by datepart(year, [time]),
datepart(month, [time]),
datepart(day, [time]),
datepart(hour, [time]),
(datepart(minute, [time]) / 5)
order by [time]
) [rn],
first_value([open]) over(
partition by datepart(year, [time]),
datepart(month, [time]),
datepart(day, [time]),
datepart(hour, [time]),
(datepart(minute, [time]) / 5)
order by [time]
) [open],
last_value([close]) over(
partition by datepart(year, [time]),
datepart(month, [time]),
datepart(day, [time]),
datepart(hour, [time]),
(datepart(minute, [time]) / 5)
order by [time]
) [close],
max([high]) over (
partition by datepart(year, [time]),
datepart(month, [time]),
datepart(day, [time]),
datepart(hour, [time]),
(datepart(minute, [time]) / 5)
) [high],
min([low]) over (
partition by datepart(year, [time]),
datepart(month, [time]),
datepart(day, [time]),
datepart(hour, [time]),
(datepart(minute, [time]) / 5)
) [low],
avg([volume]) over (
partition by datepart(year, [time]),
datepart(month, [time]),
datepart(day, [time]),
datepart(hour, [time]),
(datepart(minute, [time]) / 5)
) [volume]
from mytable t
) t
where rn = 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With