Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is SQL DATEDIFF(year, ..., ...) an Expensive Computation?

I'm trying to optimize up some horrendously complicated SQL queries because it takes too long to finish.

In my queries, I have dynamically created SQL statements with lots of the same functions, so I created a temporary table where each function is only called once instead of many, many times - this cut my execution time by 3/4.

So my question is, can I expect to see much of a difference if say, 1,000 datediff computations are narrowed to 100?

EDIT: The query looks like this :

SELECT DISTINCT M.MID, M.RE FROM #TEMP INNER JOIN M ON #TEMP.MID=M.MID 
WHERE ( #TEMP.Property1=1 ) AND 
DATEDIFF( year, M.DOB, @date2 ) >= 15  AND  DATEDIFF( year, M.DOB, @date2 ) <= 17 

where these are being generated dynamically as strings (put together in bits and pieces) and then executed so that various parameters can be changed along each iteration - mainly the last lines, containing all sorts of DATEDIFF queries.

There are about 420 queries like this where these datediffs are being calculated like so. I know that I can pull them all into a temp table easily (1,000 datediffs becomes 50) - but is it worth it, will it make any difference in seconds? I'm hoping for an improvement better than in the tenths of seconds.

like image 638
rlb.usa Avatar asked Dec 12 '22 22:12

rlb.usa


1 Answers

It depends on exactly what you are doing to be honest as to the extent of the performance hit.

For example, if you are using DATEDIFF (or indeed any other function) within a WHERE clause, then this will be a cause of poorer performance as it will prevent an index being used on that column.

e.g. basic example, finding all records in 2009

WHERE DATEDIFF(yyyy, DateColumn, '2009-01-01') = 0

would not make good use of an index on DateColumn. Whereas a better solution, providing optimal index usage would be:

WHERE DateColumn >= '2009-01-01' AND DateColumn < '2010-01-01'

I recently blogged about the difference this makes (with performance stats/execution plan comparisons), if you're interested.

That would be costlier than say returning DATEDIFF as a column in the resultset.

I would start by identifying the individual queries that are taking the most time. Check the execution plans to see where the problem lies and tune from there.

Edit: Based on the example query you've given, here's an approach you could try out to remove the use of DATEDIFF within the WHERE clause. Basic example to find everyone who was 10 years old on a given date - I think the maths is right, but you get the idea anyway! Gave it a quick test, and seems fine. Should be easy enough to adapt to your scenario. If you want to find people between (e.g.) 15 and 17 years old on a given date, then that's also possible with this approach.

-- Assuming @Date2 is set to the date at which you want to calculate someone's age 
DECLARE @AgeAtDate INTEGER
SET @AgeAtDate = 10  

DECLARE @BornFrom DATETIME
DECLARE @BornUntil DATETIME
SELECT @BornFrom = DATEADD(yyyy, -(@AgeAtDate + 1), @Date2)
SELECT @BornUntil = DATEADD(yyyy, -@AgeAtDate , @Date2)

SELECT DOB
FROM YourTable
WHERE DOB > @BornFrom AND DOB <= @BornUntil

An important note to add, is for age caculates from DOB, this approach is more accurate. Your current implementation only takes the year of birth into account, not the actual day (e.g. someone born on 1st Dec 2009 would show as being 1 year old on 1st Jan 2010 when they are not 1 until 1st Dec 2010).

Hope this helps.

like image 99
AdaTheDev Avatar answered Feb 08 '23 06:02

AdaTheDev