Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do SQL Server Scalar-valued functions get slower?

Why do Scalar-valued functions seem to cause queries to run cumulatively slower the more times in succession that they are used?

I have this table that was built with data purchased from a 3rd party.

I've trimmed out some stuff to make this post shorter... but just so you get the idea of how things are setup.

CREATE TABLE [dbo].[GIS_Location](
        [ID] [int] IDENTITY(1,1) NOT NULL, --PK
        [Lat] [int] NOT NULL,
        [Lon] [int] NOT NULL,
        [Postal_Code] [varchar](7) NOT NULL,
        [State] [char](2) NOT NULL,
        [City] [varchar](30) NOT NULL,
        [Country] [char](3) NOT NULL,

CREATE TABLE [dbo].[Address_Location](
    [ID] [int] IDENTITY(1,1) NOT NULL, --PK
    [Address_Type_ID] [int] NULL,
    [Location] [varchar](100) NOT NULL,
    [State] [char](2) NOT NULL,
    [City] [varchar](30) NOT NULL,
    [Postal_Code] [varchar](10) NOT NULL,
    [Postal_Extension] [varchar](10) NULL,
    [Country_Code] [varchar](10) NULL,

Then I have two functions that look up LAT and LON.

CREATE FUNCTION [dbo].[usf_GIS_GET_LAT]
(
    @City VARCHAR(30),
    @State CHAR(2)
)
RETURNS INT 
WITH EXECUTE AS CALLER
AS
BEGIN
    DECLARE @LAT INT

    SET @LAT = (SELECT TOP 1 LAT FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)

RETURN @LAT
END


CREATE FUNCTION [dbo].[usf_GIS_GET_LON]
(
    @City VARCHAR(30),
    @State CHAR(2)
)
RETURNS INT 
WITH EXECUTE AS CALLER
AS
BEGIN
    DECLARE @LON INT

    SET @LON = (SELECT TOP 1 LON FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)

RETURN @LON
END

When I run the following...

SET STATISTICS TIME ON

SELECT
    dbo.usf_GIS_GET_LAT(City,[State]) AS Lat,
    dbo.usf_GIS_GET_LON(City,[State]) AS Lon
FROM
    Address_Location WITH(NOLOCK)
WHERE
    ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)

SET STATISTICS TIME OFF

100 ~= 8 ms, 200 ~= 32 ms, 400 ~= 876 ms

--Edit Sorry I should have been more clear. I'm not looking to tune the query listed above. This is just a sample to show the execution time getting slower the more records it crunches through. In the real world application the functions are used as part of a where clause to build a radius around a city and state to include all records with in that region.

like image 638
DBAndrew Avatar asked Apr 28 '09 22:04

DBAndrew


People also ask

What causes SQL Server to slow down?

SQL Server caches the databases in what's called a buffer pool. A buffer pool acts as a storage space for data pages that have been recently written or read to and from disk. A small buffer pool will slow down your SQL application by overwhelming the disk's subsystem.

Why are SQL views so slow?

When SQL Server processes a SELECT from a view, it evaluates the code in the view BEFORE it deals with the WHERE clause or any join in the outer query. With more tables joined, it will be slow compared to a SELECT from base tables with the same results.

What is scalar valued function in SQL Server?

What is a Scalar Valued Function in SQL Server? A Scalar Valued Function is a function that returns a single value back to the caller. A Scalar Valued Function can be used anywhere a single value is expected. You can pass in one or more parameters to a Scalar Valued Function and do work based on those parameters.

Do views slow down queries?

Views make queries faster to write, but they don't improve the underlying query performance. However, we can add a unique, clustered index to a view, creating an indexed view, and realize potential and sometimes significant performance benefits, especially when performing complex aggregations and other calculations.


2 Answers

In most cases, it's best to avoid scalar valued functions that reference tables because (as others said) they are basically black boxes that need to be ran once for every row, and cannot be optimized by the query plan engine. Therefore, they tend to scale linearly even if the associated tables have indexes.

You may want to consider using an inline-table-valued function, since they are evaluated inline with the query, and can be optimized. You get the encapsulation you want, but the performance of pasting the expressions right in the select statement.

As a side effect of being inlined, they can't contain any procedural code (no declare @variable; set @variable = ..; return). However, they can return several rows and columns.

You could re-write your functions something like this:

create function usf_GIS_GET_LAT(
    @City varchar (30),
    @State char (2)
)
returns table
as return (
  select top 1 lat
  from GIS_Location with (nolock) 
  where [State] = @State
    and [City] = @City
);

GO

create function usf_GIS_GET_LON (
    @City varchar (30),
    @State char (2)
)
returns table
as return (
  select top 1 LON
  from GIS_Location with (nolock)
  where [State] = @State
    and [City] = @City
);

The syntax to use them is also a little different:

select
    Lat.Lat,
    Lon.Lon
from
    Address_Location with (nolock)
    cross apply dbo.usf_GIS_GET_LAT(City,[State]) AS Lat
    cross apply dbo.usf_GIS_GET_LON(City,[State]) AS Lon
WHERE
    ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)
like image 78
John Gibb Avatar answered Oct 15 '22 17:10

John Gibb


They do not.

There is no bug in scalar functions that causes its performance to degrade exponentially depending on the number of rows in the scalar function is executed against. Try your tests again and have a look at SQL profiler, looking at the CPU and READS and DURATION columns. Increase you test size to include tests that take longer than a second, two seconds, five seconds.

CREATE FUNCTION dbo.slow
(
    @ignore int
)
RETURNS INT 
AS
BEGIN
    DECLARE @slow INT
    SET @slow = (select count(*) from sysobjects a 
        cross join sysobjects b 
        cross join sysobjects c 
        cross join sysobjects d 
        cross join sysobjects e 
        cross join sysobjects f
    where a.id = @ignore) 

    RETURN @slow
END
go
SET STATISTICS TIME ON

select top 1 dbo.slow(id)
from sysobjects
go
select top 5 dbo.slow(id)
from sysobjects
go
select top 10 dbo.slow(id)
from sysobjects
go
select top 20 dbo.slow(id)
from sysobjects
go
select top 40 dbo.slow(id)
from sysobjects

SET STATISTICS TIME OFF

Output

SQL Server Execution Times:
   CPU time = 203 ms,  elapsed time = 202 ms.


SQL Server Execution Times:
   CPU time = 889 ms,  elapsed time = 939 ms.

SQL Server Execution Times:
   CPU time = 1748 ms,  elapsed time = 1855 ms.

SQL Server Execution Times:
   CPU time = 3541 ms,  elapsed time = 3696 ms.


SQL Server Execution Times:
   CPU time = 7207 ms,  elapsed time = 7392 ms.

Keep in mind that if you are running a scalar function against rows in the result set, the scalar function will be executed per-row with no global optimisation.

like image 26
Sam Saffron Avatar answered Oct 15 '22 17:10

Sam Saffron