I have a table that contains a lot of data, where we particularly care about the date
field. The reason for this is that the data volume just went up ~30x, and the old ways will fall apart soon. The query I hope you can help me optimize needs to:
For instance, the current table contains data in 5 second (+/- a little) intervals. I need to sample that table and get the record that falls closest to a 30 second interval.
What I have right now works just fine. I'm simply curious if there is a way to optimize it more. If I can do it in Linq To SQL, that would be neat too. I am even interested in suggestions on indexes, given the amount of date values (~2 million rows min).
declare @st datetime ; set @st = '2012-01-31 05:05:00';
declare @end datetime ; set @end = '2012-01-31 05:10:00';
select distinct
log.* -- id,
from
dbo.fn_GenerateDateSteps(@st, @end, 30) as d
inner join lotsOfLogData log on l.Id = (
select top 1 e.[Id]
from
lotsOfLogData as log -- contains data in 5 second intervals
where
log.stationId = 1000
-- search for dates in a certain range
AND utcTime between DateAdd(s, -10, dt) AND DateAdd(s, 5, dt)
order by
-- get the 'closest'. this can change a little, but will always
-- be based on a difference between the date
abs(datediff(s, dt, UtcTime))
)
-- updated the query to be correct. stadionId should be inside the subquery
The table structure of lotsOfLogData is below. There are relatively few station IDs (maybe 50), but lots of records for each. We know the station id when we query.
create table ##lotsOfLogData (
Id bigint identity(1,1) not null
, StationId int not null
, UtcTime datetime not null
-- 20 other fields, used for other calculations
)
fn_GenerateDateSteps returns a dataset like this, for the parameters given:
[DT]
2012-01-31 05:05:00.000
2012-01-31 05:05:30.000
2012-01-31 05:06:00.000
2012-01-31 05:06:30.000 (and so on, every 30 seconds)
I have done this with a temporary table as well, in this manner, but that came out just a little bit more expensive.
declare @dates table ( dt datetime, ClosestId bigint);
insert into @dates (dt) select dt from dbo.fn_GenerateDateSteps(@st, @end, 30)
update @dates set closestId = ( -- same subquery as above )
select * from lotsOfLogData inner join @dates on Id = ClosestId
Edit: Fixed up
Got 200K+ rows to work with now. I tried both ways, and the cross apply with an appropriate index (id/time + includes(..all columns...) worked just fine. However, I ended up with the query I started, using a simpler (and existing) index on [id+time]. The more widely understandable query is why I settled on that one. Maybe there still is a better way to do it, but I can't see it :D
-- subtree cost (crossapply) : .0808
-- subtree cost (id based) : .0797
-- see above query for what i ended up with
You could try
inner join
to a cross apply
. where log.stationid
to the subselect.SQL Statement
SELECT DISTINCT log.* -- id,
FROM dbo.fn_GenerateDateSteps(@st, @end, 30) AS d
CROSS APPLY (
SELECT TOP 1 log.*
FROM lotsOfLogData AS log -- contains data in 5 second intervals
WHERE -- search for dates in a certain range
utcTime between DATEADD(s, -10, d.dt) AND DATEADD(s, 5, d.dt)
AND log.stationid = 1000
ORDER BY
-- get the 'closest'. this can change a little, but will always
-- be based on a difference between the date
ABS(DATEDIFF(s, d.dt, UtcTime))
) log
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With