Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any existing, elegant, patterns for an optional TOP clause?

Tags:

sql

sql-server

Take the (simplified) stored procedure defined here:

create procedure get_some_stuffs
  @max_records int = null
as
begin
  set NOCOUNT on

  select top (@max_records) *
  from my_table
  order by mothers_maiden_name
end

I want to restrict the number of records selected only if @max_records is provided.

Problems:

  1. The real query is nasty and large; I want to avoid having it duplicated similar to this:

    if(@max_records is null)
    begin
      select *
      from {massive query}
    end
    else
    begin
      select top (@max_records)
      from {massive query}
    end
    
  2. An arbitrary sentinel value doesn't feel right:

    select top (ISNULL(@max_records, 2147483647)) *
    from {massive query}
    

    For example, if @max_records is null and {massive query} returns less than 2147483647 rows, would this be identical to:

    select * 
    from {massive query}
    

    or is there some kind of penalty for selecting top (2147483647) * from a table with only 50 rows?

Are there any other existing patterns that allow for an optionally count-restricted result set without duplicating queries or using sentinel values?

like image 242
Alex McMillan Avatar asked Mar 07 '17 01:03

Alex McMillan


People also ask

Which clause should be used with top?

In general, the TOP and ORDER BY construction are used together. Otherwise, the TOP clause will return the N number of rows in an uncertain order. For this reason, it is the best practice to use the TOP clause with an ORDER BY to obtain a certain sorted result.

What is the purpose of select top clause?

The SELECT TOP clause is used to specify the number of records to return. The SELECT TOP clause is useful on large tables with thousands of records. Returning a large number of records can impact performance. Note: Not all database systems support the SELECT TOP clause.

What is top 100 percent SQL Server?

So if we see TOP (100) PERCENT in code, we suspect that the developer has attempted to create an ordered view and we check if that's important before going any further. Chances are that the query that uses that view (or the client application) need to be modified instead.

What is option clause in SQL Server?

Specifies that the indicated query hint should be used throughout the entire query. Each query hint can be specified only one time, although multiple query hints are permitted. Only one OPTION clause can be specified with the statement. This clause can be specified in the SELECT, DELETE, UPDATE and MERGE statements.

What is the use of top clause in SQL Server?

If we use it in the INSERT clause, it might cause performance issues, especially for a large table. We can use TOP Clause in a SQL Update statement as well to restrict the number of rows for an update. Essentially, it is a combination of the select statement and update.

Can we use the top clause in a SQL DELETE statement?

We can use TOP Clause in a SQL delete statement as well. We should use the ORDER BY clause in the delete statement to avoid unnecessary deletion of data. In the above query, we want to retrieve the top 10 customers’ records in [orderdate] ascending order.

What is the best design pattern for a form?

Basically, you can see several design patterns that are consistent across all of these forms. One of them is predominantly using a one-column layout, perhaps maybe with a couple of exceptions. Placeholders are good, but they do not replace labels that explain what's requested in a field. Smaller forms convert way better longer forms.

What are the different types of class design patterns?

1 Creational: These patterns are designed for class instantiation. They can be either class-creation patterns or object-creational patterns. 2 Structural: These patterns are designed with regard to a class's structure and composition. ... 3 Behavioral: These patterns are designed depending on how one class communicates with others.


2 Answers

I was thinking about this, and although I like the explicitness of the IF statement in your Problem 1 statement, I understand the issue of duplication. As such, you could put the main query in a single CTE, and use some trickery to query from it (the bolded parts being the highlight of this solution):

CREATE PROC get_some_stuffs
(
    @max_records int = NULL
)
AS
BEGIN
    SET NOCOUNT ON;

    WITH staged AS (
        -- Only write the main query one time
        SELECT * FROM {massive query}
    )
    -- This part below the main query never changes:
    SELECT * 
    FROM (
        -- A little switcheroo based on the value of @max_records
        SELECT * FROM staged WHERE @max_records IS NULL
        UNION ALL
        SELECT TOP(ISNULL(@max_records, 0)) * FROM staged WHERE @max_records IS NOT NULL
    ) final
    -- Can't use ORDER BY in combination with a UNION, so move it out here
    ORDER BY mothers_maiden_name
END

I looked at the actual query plans for each and the optimizer is smart enough to completely avoid the part of the UNION ALL that doesn't need to run.

The ISNULL(@max_records, 0) is in there because TOP NULL isn't valid, and it will not compile.

like image 147
Cᴏʀʏ Avatar answered Sep 23 '22 23:09

Cᴏʀʏ


There are a few methods, but as you probably notice these all look ugly or are unnecessarily complicated. Furthermore, do you really need that ORDER BY?

You could use TOP (100) PERCENT and a View, but the PERCENT only works if you do not really need that expensive ORDER BY, since SQL Server will ignore your ORDER BY if you try it.

I suggest taking advantage of stored procedures, but first lets explain the difference in the type of procs:

Hard Coded Parameter Sniffing

--Note the lack of a real parametrized column. See notes below.
IF OBJECT_ID('[dbo].[USP_TopQuery]', 'U') IS NULL
    EXECUTE('CREATE PROC dbo.USP_TopQuery AS ')
GO
ALTER PROC [dbo].[USP_TopQuery] @MaxRows NVARCHAR(50)
AS
BEGIN
DECLARE @SQL NVARCHAR(4000) = N'SELECT * FROM dbo.ThisFile'
      , @Option NVARCHAR(50) = 'TOP (' + @MaxRows + ') *'
IF ISNUMERIC(@MaxRows) = 0
    EXEC sp_executesql @SQL     
ELSE
    BEGIN
        SET @SQL = REPLACE(@SQL, '*', @Option)
        EXEC sp_executesql @SQL
    END
END

Local Variable Parameter Sniffing

IF OBJECT_ID('[dbo].[USP_TopQuery2]', 'U') IS NULL
    EXECUTE('CREATE PROC dbo.USP_TopQuery2 AS ')
GO
ALTER PROC [dbo].[USP_TopQuery2] @MaxRows INT NULL
AS
BEGIN
DECLARE @Rows INT;
    SET @Rows = @MaxRows;

IF @MaxRows IS NULL
    SELECT *
    FROM dbo.THisFile   
ELSE
    SELECT TOP (@Rows) *
    FROM dbo.THisFile
END

No Parameter Sniffing, old method

IF OBJECT_ID('[dbo].[USP_TopQuery3]', 'U') IS NULL
    EXECUTE('CREATE PROC dbo.USP_TopQuery3 AS ')
GO
ALTER PROC [dbo].[USP_TopQuery3] @MaxRows INT NULL
AS
BEGIN

IF @MaxRows IS NULL
    SELECT *
    FROM dbo.THisFile   
ELSE
    SELECT TOP (@MaxRows) *
    FROM dbo.THisFile
END

PLEASE NOTE ABOUT PARAMETER SNIFFING:

SQL Server initializes variables in Stored Procs at the time of compile, not when it parses.

This means that SQL Server will be unable to guess the query and will choose the last valid execution plan for the query, regardless of whether it is even good.

There are two methods, hard coding an local variables that allow the Optimizer to guess.

  1. Hard Coding for Parameter Sniffing

    • Use sp_executesql to not only reuse the query, but prevent SQL Injection.
    • However, in this type of query, will not always perform substantially better since a TOP Operator is not a column or table (so the statement effectively has no variables in this version I used)
    • Statistics at the time of the creation of your compiled plan will dictate how affective the method is if you are not using a variable on a predicate (ON, WHERE, HAVING)
    • Can use options or hint to RECOMPILE to overcome this issue.
  2. Variable Parameter Sniffing

    • Variable Paramter sniffing, on the other hand, is flexible enough to work witht the statistics here, and in my own testing it seemed the variable parameter had the advantage of the query using statistics (particularly after I updated the statistics).

Top Queries

  • Ultimately, the issue of performance is about which method will use the least amount of steps to traverse through the leaflets. Statistics, the rows in your table, and the rules for when SQL Server will decide to use a Scan vs Seek impact the performance.

  • Running different values will show performances change significantly, though typically better than USP_TopQuery3. So DO NOT ASSUME one method is necessarily better than the other.

    • Also note you can use a table-valued function to do the same, but as Dave Pinal would say:

If you are going to answer that ‘To avoid repeating code, you use Function’ ‑ please think harder! Stored procedure can do the same...

if you are going to answer with ‘Function can be used in SELECT, whereas Stored Procedure cannot be used’ ‑ again think harder!

SQL SERVER – Question to You – When to use Function and When to use Stored Procedure

like image 43
clifton_h Avatar answered Sep 24 '22 23:09

clifton_h