Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Group By Year, Month, Week, Day, Hour SQL vs Procedural Performance

Tags:

I need to write a query that will group a large number of records by periods of time from Year to Hour.

My initial approach has been to decide the periods procedurally in C#, iterate through each and run the SQL to get the data for that period, building up the dataset as I go.

SELECT Sum(someValues) FROM table1 WHERE deliveryDate BETWEEN @fromDate AND @ toDate 

I've subsequently discovered I can group the records using Year(), Month() Day(), and datepart(week, date) and datepart(hh, date).

SELECT Sum(someValues) FROM table1 GROUP BY Year(deliveryDate), Month(deliveryDate), Day(deliveryDate) 

My concern is that using datepart in a group by will lead to worse performance than running the query multiple times for a set period of time due to not being able to use the index on the datetime field as efficiently; any thoughts as to whether this is true?

Thanks.

like image 605
RSlaughter Avatar asked Jan 27 '09 10:01

RSlaughter


People also ask

Which data type is faster in SQL?

To speed up frequent sorts, use an int (or an integer-based) data type if possible. SQL Server sorts integer data faster than character data.

How a SQL table processing can become more efficient and faster?

The number of rows in a query impacts how quickly it is processed. In other words, the larger the number of rows, the higher the probability of slow performance. Simply shrinking the size of the table by removing redundant data can easily remedy this issue.

Which query will take more time for execution?

There are a number of things that may cause a query to take longer time to execute: Inefficient query – Use non-indexed columns while lookup or joining, thus MySQL takes longer time to match the condition. Table lock – The table is locked, by global lock or explicit table lock when the query is trying to access it.

Which is faster between or in SQL?

Between is faster due to lesser comparisons. With IN clause each elements are traversed every time. But purpose of both are different: Between is used when you are comparing with Range of values in some kind of sequence.


2 Answers

As with anything performance related Measure

Checking the query plan up for the second approach will tell you any obvious problems in advance (a full table scan when you know one is not needed) but there is no substitute for measuring. In SQL performance testing that measurement should be done with appropriate sizes of test data.

Since this is a complex case, you are not simply comparing two different ways to do a single query but comparing a single query approach against a iterative one, aspects of your environment may play a major role in the actual performance.

Specifically

  1. the 'distance' between your application and the database as the latency of each call will be wasted time compared to the one big query approach
  2. Whether you are using prepared statements or not (causing additional parsing effort for the database engine on each query)
  3. whether the construction of the ranges queries itself is costly (heavily influenced by 2)
like image 186
ShuggyCoUk Avatar answered Sep 21 '22 11:09

ShuggyCoUk


If you put a formula into the field part of a comparison, you get a table scan.

The index is on field, not on datepart(field), so ALL fields must be calculated - so I think your hunch is right.

like image 22
Galwegian Avatar answered Sep 23 '22 11:09

Galwegian