Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing performance of Scalar vs Table-valued functions in sql server

OK so I have read a whole bunch of articles suggesting table-value functions and cross apply give better performance than a scalar udf. I wanted to write my function in both ways and then test to see which one is better - but I cannot figure out what I'm supposed to use/look for to understand which is the better option.

I'm using SQL Server 2005. I've tried running the estimated execution plan, the actual execution plan and analyze query in database engine tuning advisor and I don't know what it is trying to tell me.

Using showplan_all on /off it looks like the table based function will use more cpu 1.157e-06 vs 8.3e-05 but the table function has a total subtree cost of 0.000830157 vs 0.01983356.

The query cost of the table valued function also seems to have a higher cost than the scalar one. Even though I thought it was supposed to be the better option.

So whilst I'd like to prove it myself which one gives the better performance - I'm just not sure what to look for in these tools - so any suggestions would be appreciated!

I need to get an academic year value (based on a date range set in the database) based on a calendar date so the function contents are below - so its just whether I go scalar or table based. This year feeds into other queries..

CREATE FUNCTION fn_AcademicYear
(
    -- Add the parameters for the function here
    @StartDate DateTime
)
RETURNS 
@AcademicYear TABLE 
(
    AcademicYear int
)
AS
BEGIN

DECLARE @YearOffset int, @AcademicStartDate DateTime 

    -- Lookup Academic Year Starting Date
    SELECT @AcademicStartDate = CONVERT(DateTime,[Value])
    FROM dbo.SystemSetting
    WHERE [Key] = 'AcademicYear.StartDate'

    SET @YearOffset = DATEPART(YYYY,@StartDate) - DATEPART(YYYY,@AcademicStartDate);
    -- try setting academic looking start date to year of the date passed in
    SET @AcademicStartDate = DATEADD(YYYY, @YearOffset, @AcademicStartDate);

    IF @StartDate < @AcademicStartDate
    BEGIN
        SET @AcademicStartDate = DATEADD(YYYY, @YearOffset-1, @AcademicStartDate);
    END

      INSERT @AcademicYear
      SELECT YEAR(@AcademicStartDate)

    RETURN 

Thanks!!

like image 490
Jen Avatar asked Dec 15 '10 06:12

Jen


People also ask

What is the difference between scalar valued functions vs table-valued functions?

There are two main types of user-defined functions in SQL based on the data they return: Scalar functions: These types of functions return a single value, i.e float, int, varchar, datetime, etc. Table-Valued functions: These functions return tables.

Which one is faster SP or function?

There is no difference in speed between a query run inside a function and one run inside a procedure. Stored procedures have problems aggregating results, they cannot be composed with other stored procedures.

Which one is good in performance while you compare between stored procedure and functions?

We conclude that the table-valued functions are more flexible to filter results horizontally and vertically, to use a select into. Stored procedures are less flexible to reuse the results. In terms of performance, table-valued functions are a good choice. The performance is basically the same than stored procedures.

Which is faster function or stored procedure in SQL Server?

Advantages of Functions You can create a function once, store it in your database and use it whenever you need. Similar to Stored Procedures, Transact-SQL User-Defined Functions reduce the compilation cost of Transact-SQL code by caching the plans and reusing them for repeated executions. This makes it faster.


1 Answers

You may not be seeing the performance gains you expect because your table-valued function is multifunction, not in-line. Multi-function TVFs have to be executed in the same way as scalar UDFs - once per row - so there's very little gain.

Following the example in this article by Itzik Ben-Gan (which discusses the benefits of in-line TVFs), set up the following test:

Create a numbers table with 1 million rows:

SET NOCOUNT ON;
IF OBJECT_ID('dbo.T1') IS NOT NULL DROP TABLE T1;
GO

WITH
  L0 AS (SELECT 0 AS c UNION ALL SELECT 0),
  L1 AS (SELECT 0 AS c FROM L0 AS A CROSS JOIN L0 AS B),
  L2 AS (SELECT 0 AS c FROM L1 AS A CROSS JOIN L1 AS B),
  L3 AS (SELECT 0 AS c FROM L2 AS A CROSS JOIN L2 AS B),
  L4 AS (SELECT 0 AS c FROM L3 AS A CROSS JOIN L3 AS B),
  L5 AS (SELECT 0 AS c FROM L4 AS A CROSS JOIN L4 AS B),
  Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)
SELECT n INTO dbo.T1 FROM Nums WHERE n <= 1000000;

Run a million executions of your TVF using the following code:

set statistics time on
SELECT n,DATEADD(HOUR,n,'1900-01-01'),AY.AcademicYear
FROM T1
CROSS APPLY dbo.fn_AcademicYear(DATEADD(HOUR,n,'1900-01-01')) AS AY
set statistics time off

On my system, this showed an average of 83 seconds elapsed time for three executions, running DBCC dropcleanbuffers between each execution.

If you perform a similar test for your scalar valued function, you should have a clearer idea of comparative performance.

The test also revealed what appears to be a bug in your function. If the AcademicYear.StartDate is set to '2010-09-01', the Academic Year returned for an input of '1900-01-01' is 1789, where it seems like 1899 would be expected.

To get the best performance, you'd need to convert the TVF to be in-line - I came up with the following, which I believe corrects the bug:

CREATE FUNCTION fn_AcademicYear2
(
    @StartDate DATETIME
)
RETURNS TABLE
AS
RETURN
(
    -- Lookup Academic Year Starting Date
    WITH dtCTE
    AS
    (
        SELECT CONVERT(DATETIME,[Value]) AS dt
        FROM dbo.SystemSetting
        WHERE [KEY] = 'AcademicYear.StartDate'
    )
    SELECT CASE WHEN @StartDate >= DATEADD(YEAR,DATEDIFF(YEAR,dt,@StartDate),dt) 
                THEN YEAR(@StartDate)
                ELSE YEAR(DATEADD(YEAR,DATEDIFF(YEAR,dt,@StartDate) - 1,dt))
            END AS AcademicYear
    FROM dtCTE
)
GO

This had an average elapsed time of 8.9 seconds over three runs - almost ten times faster.

The other thing to consider is that the performance benefit from using TVF will be negligible unless you are applying it to multiple rows, as in this test. If you're using it on one value at a time, you won't see a lot of benfit unless you have thousands of instances of the function executing in parallel.

like image 157
Ed Harper Avatar answered Oct 12 '22 16:10

Ed Harper