Take the (simplified) stored procedure defined here:
create procedure get_some_stuffs
@max_records int = null
as
begin
set NOCOUNT on
select top (@max_records) *
from my_table
order by mothers_maiden_name
end
I want to restrict the number of records selected only if @max_records
is provided.
Problems:
The real query is nasty and large; I want to avoid having it duplicated similar to this:
if(@max_records is null)
begin
select *
from {massive query}
end
else
begin
select top (@max_records)
from {massive query}
end
An arbitrary sentinel value doesn't feel right:
select top (ISNULL(@max_records, 2147483647)) *
from {massive query}
For example, if @max_records
is null
and {massive query}
returns less than 2147483647
rows, would this be identical to:
select *
from {massive query}
or is there some kind of penalty for selecting top (2147483647) *
from a table with only 50 rows?
Are there any other existing patterns that allow for an optionally count-restricted result set without duplicating queries or using sentinel values?
In general, the TOP and ORDER BY construction are used together. Otherwise, the TOP clause will return the N number of rows in an uncertain order. For this reason, it is the best practice to use the TOP clause with an ORDER BY to obtain a certain sorted result.
The SELECT TOP clause is used to specify the number of records to return. The SELECT TOP clause is useful on large tables with thousands of records. Returning a large number of records can impact performance. Note: Not all database systems support the SELECT TOP clause.
So if we see TOP (100) PERCENT in code, we suspect that the developer has attempted to create an ordered view and we check if that's important before going any further. Chances are that the query that uses that view (or the client application) need to be modified instead.
Specifies that the indicated query hint should be used throughout the entire query. Each query hint can be specified only one time, although multiple query hints are permitted. Only one OPTION clause can be specified with the statement. This clause can be specified in the SELECT, DELETE, UPDATE and MERGE statements.
If we use it in the INSERT clause, it might cause performance issues, especially for a large table. We can use TOP Clause in a SQL Update statement as well to restrict the number of rows for an update. Essentially, it is a combination of the select statement and update.
We can use TOP Clause in a SQL delete statement as well. We should use the ORDER BY clause in the delete statement to avoid unnecessary deletion of data. In the above query, we want to retrieve the top 10 customers’ records in [orderdate] ascending order.
Basically, you can see several design patterns that are consistent across all of these forms. One of them is predominantly using a one-column layout, perhaps maybe with a couple of exceptions. Placeholders are good, but they do not replace labels that explain what's requested in a field. Smaller forms convert way better longer forms.
1 Creational: These patterns are designed for class instantiation. They can be either class-creation patterns or object-creational patterns. 2 Structural: These patterns are designed with regard to a class's structure and composition. ... 3 Behavioral: These patterns are designed depending on how one class communicates with others.
I was thinking about this, and although I like the explicitness of the IF
statement in your Problem 1
statement, I understand the issue of duplication. As such, you could put the main query in a single CTE, and use some trickery to query from it (the bolded parts being the highlight of this solution):
CREATE PROC get_some_stuffs
(
@max_records int = NULL
)
AS
BEGIN
SET NOCOUNT ON;
WITH staged AS (
-- Only write the main query one time
SELECT * FROM {massive query}
)
-- This part below the main query never changes:
SELECT *
FROM (
-- A little switcheroo based on the value of @max_records
SELECT * FROM staged WHERE @max_records IS NULL
UNION ALL
SELECT TOP(ISNULL(@max_records, 0)) * FROM staged WHERE @max_records IS NOT NULL
) final
-- Can't use ORDER BY in combination with a UNION, so move it out here
ORDER BY mothers_maiden_name
END
I looked at the actual query plans for each and the optimizer is smart enough to completely avoid the part of the UNION ALL
that doesn't need to run.
The ISNULL(@max_records, 0)
is in there because TOP NULL
isn't valid, and it will not compile.
There are a few methods, but as you probably notice these all look ugly or are unnecessarily complicated. Furthermore, do you really need that ORDER BY?
You could use TOP (100) PERCENT
and a View, but the PERCENT only works if you do not really need that expensive ORDER BY
, since SQL Server will ignore your ORDER BY
if you try it.
I suggest taking advantage of stored procedures, but first lets explain the difference in the type of procs:
Hard Coded Parameter Sniffing
--Note the lack of a real parametrized column. See notes below.
IF OBJECT_ID('[dbo].[USP_TopQuery]', 'U') IS NULL
EXECUTE('CREATE PROC dbo.USP_TopQuery AS ')
GO
ALTER PROC [dbo].[USP_TopQuery] @MaxRows NVARCHAR(50)
AS
BEGIN
DECLARE @SQL NVARCHAR(4000) = N'SELECT * FROM dbo.ThisFile'
, @Option NVARCHAR(50) = 'TOP (' + @MaxRows + ') *'
IF ISNUMERIC(@MaxRows) = 0
EXEC sp_executesql @SQL
ELSE
BEGIN
SET @SQL = REPLACE(@SQL, '*', @Option)
EXEC sp_executesql @SQL
END
END
Local Variable Parameter Sniffing
IF OBJECT_ID('[dbo].[USP_TopQuery2]', 'U') IS NULL
EXECUTE('CREATE PROC dbo.USP_TopQuery2 AS ')
GO
ALTER PROC [dbo].[USP_TopQuery2] @MaxRows INT NULL
AS
BEGIN
DECLARE @Rows INT;
SET @Rows = @MaxRows;
IF @MaxRows IS NULL
SELECT *
FROM dbo.THisFile
ELSE
SELECT TOP (@Rows) *
FROM dbo.THisFile
END
No Parameter Sniffing, old method
IF OBJECT_ID('[dbo].[USP_TopQuery3]', 'U') IS NULL
EXECUTE('CREATE PROC dbo.USP_TopQuery3 AS ')
GO
ALTER PROC [dbo].[USP_TopQuery3] @MaxRows INT NULL
AS
BEGIN
IF @MaxRows IS NULL
SELECT *
FROM dbo.THisFile
ELSE
SELECT TOP (@MaxRows) *
FROM dbo.THisFile
END
PLEASE NOTE ABOUT PARAMETER SNIFFING:
SQL Server initializes variables in Stored Procs at the time of compile, not when it parses.
This means that SQL Server will be unable to guess the query and will choose the last valid execution plan for the query, regardless of whether it is even good.
There are two methods, hard coding an local variables that allow the Optimizer to guess.
Hard Coding for Parameter Sniffing
ON
, WHERE
, HAVING
)RECOMPILE
to overcome this issue.Variable Parameter Sniffing
Ultimately, the issue of performance is about which method will use the least amount of steps to traverse through the leaflets. Statistics, the rows in your table, and the rules for when SQL Server will decide to use a Scan vs Seek impact the performance.
Running different values will show performances change significantly, though typically better than USP_TopQuery3. So DO NOT ASSUME one method is necessarily better than the other.
If you are going to answer that ‘To avoid repeating code, you use Function’ ‑ please think harder! Stored procedure can do the same...
if you are going to answer with ‘Function can be used in SELECT, whereas Stored Procedure cannot be used’ ‑ again think harder!
SQL SERVER – Question to You – When to use Function and When to use Stored Procedure
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With