I need to write a query that will group a large number of records by periods of time from Year to Hour.
My initial approach has been to decide the periods procedurally in C#, iterate through each and run the SQL to get the data for that period, building up the dataset as I go.
SELECT Sum(someValues) FROM table1 WHERE deliveryDate BETWEEN @fromDate AND @ toDate
I've subsequently discovered I can group the records using Year(), Month() Day(), and datepart(week, date) and datepart(hh, date).
SELECT Sum(someValues) FROM table1 GROUP BY Year(deliveryDate), Month(deliveryDate), Day(deliveryDate)
My concern is that using datepart in a group by will lead to worse performance than running the query multiple times for a set period of time due to not being able to use the index on the datetime field as efficiently; any thoughts as to whether this is true?
Thanks.
To speed up frequent sorts, use an int (or an integer-based) data type if possible. SQL Server sorts integer data faster than character data.
The number of rows in a query impacts how quickly it is processed. In other words, the larger the number of rows, the higher the probability of slow performance. Simply shrinking the size of the table by removing redundant data can easily remedy this issue.
There are a number of things that may cause a query to take longer time to execute: Inefficient query – Use non-indexed columns while lookup or joining, thus MySQL takes longer time to match the condition. Table lock – The table is locked, by global lock or explicit table lock when the query is trying to access it.
Between is faster due to lesser comparisons. With IN clause each elements are traversed every time. But purpose of both are different: Between is used when you are comparing with Range of values in some kind of sequence.
As with anything performance related Measure
Checking the query plan up for the second approach will tell you any obvious problems in advance (a full table scan when you know one is not needed) but there is no substitute for measuring. In SQL performance testing that measurement should be done with appropriate sizes of test data.
Since this is a complex case, you are not simply comparing two different ways to do a single query but comparing a single query approach against a iterative one, aspects of your environment may play a major role in the actual performance.
Specifically
If you put a formula into the field part of a comparison, you get a table scan.
The index is on field, not on datepart(field), so ALL fields must be calculated - so I think your hunch is right.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With