Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why SQL functions are faster than UDF

Though it's a quite subjective question but I feel it necessary to share on this forum.

I have personally experienced that when I create a UDF (even if that is not complex) and use it into my SQL it drastically decrease the performance. But when I use SQL inbuild function they happen to work pretty faster. Conversion , logical & string functions are clear example of that.

So, my question is "Why SQL in build functions are faster than UDF"? and it would be an advantage if someone can guide me how can I judge/manipulate function cost either mathematically or logically.

like image 801
Zerotoinfinity Avatar asked Nov 09 '13 22:11

Zerotoinfinity


People also ask

Why is UDF so slow?

The reason that Python UDF is slow, is probably the PySpark UDF is not implemented in a most optimized way: According to the paragraph from the link. Spark added a Python API in version 0.7, with support for user-defined functions.

Are user-defined functions faster?

There is no difference in speed between a query run inside a function and one run inside a procedure. Stored procedures have problems aggregating results, they cannot be composed with other stored procedures.

How can I speed up my UDF?

Make Your Scalar UDF just Run faster by Using SQL Server version 15. x. If you want to make your Scalar UDF run faster without making any coding changes, then SQL Server 2019 is for you. With this new version of SQL Server, the FROID framework was added.


1 Answers

This is a well known issue with scalar UDFs in SQL Server.

They are not inlined into the plan and calling them adds overhead compared with having the same logic inline.

The following takes just under 2 seconds on my machine

WITH T10(N) AS 
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) --10 rows                                    
, T(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
           FROM T10 a, T10 b, T10 c, T10 d, T10 e, T10 f, T10 g)  -- 10 million rows
SELECT MAX(N - N)
FROM T
OPTION (MAXDOP 1)

Creating the simple scalar UDF

CREATE FUNCTION dbo.F1 (@N BIGINT)
RETURNS BIGINT 
WITH SCHEMABINDING
AS
BEGIN
RETURN (@N - @N)
END

And changing the query to MAX(dbo.F1(N)) instead of MAX(N - N) it takes around 26 seconds with STATISTICS TIME OFF and 37 with it on.

An average increase of 2.6μs / 3.7μs for each of the 10 million function calls.

Running the Visual Studio profiler shows that the vast majority of time is taken under UDFInvoke. The names of the methods in the call stack gives some idea of what the additional overhead is doing (copying parameters, executing statements, setting up security context).

enter image description here

Moving the logic into an inline table valued function

CREATE FUNCTION dbo.F2 (@N BIGINT)
RETURNS TABLE
RETURN(SELECT @N - @N AS X)

And rewriting the query as

SELECT MAX(X)
FROM Nums
CROSS APPLY dbo.F2(N)

executes in as fast as a time as the original query that does not use any functions.

like image 76
Martin Smith Avatar answered Oct 04 '22 15:10

Martin Smith