Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why select Top clause could lead to long time cost

The following query takes forever to finish. But if I remove the top 10 clause, it finishs rather quickly. big_table_1 and big_table_2 are 2 tables with 10^5 records.

I used to believe that top clause will reduce the time cost, but it's apparently not here. Why???

select top 10 ServiceRequestID
from 
(
    (select * 
     from  big_table_1
     where big_table_1.StatusId=2
    ) cap1
    inner join
      big_table_2 cap2
    on cap1.ServiceRequestID = cap2.CustomerReferenceNumber
    )
like image 330
smwikipedia Avatar asked Mar 08 '12 11:03

smwikipedia


People also ask

What is the purpose of the SQL SELECT top clause?

The SQL SELECT TOP Clause The SELECT TOP clause is used to specify the number of records to return. The SELECT TOP clause is useful on large tables with thousands of records. Returning a large number of records can impact performance.

What are the limitations of SELECT into clause?

Limitations and RestrictionsYou cannot use SELECT... INTO to create a partitioned table, even when the source table is partitioned. SELECT... INTO does not use the partition scheme of the source table; instead, the new table is created in the default filegroup.

Why should a query that uses a top clause also contain an ORDER BY clause?

In general, the TOP and ORDER BY construction are used together. Otherwise, the TOP clause will return the N number of rows in an uncertain order. For this reason, it is the best practice to use the TOP clause with an ORDER BY to obtain a certain sorted result.

What is an alternative for top clause in SQL?

There is an alternative to TOP clause, which is to use ROWCOUNT. Use ROWCOUNT with care, as it can lead you into all sorts of problems if it's not turned off.


3 Answers

There are other stackoverflow discussions on this same topic (links at bottom). As noted in the comments above it might have something to do with indexes and the optimizer getting confused and using the wrong one.

My first thought is that you are doing a select top serviceid from (select *....) and the optimizer may have difficulty pushing the query down to the inner queries and making using of the index.

Consider rewriting it as

select top 10 ServiceRequestID  
from  big_table_1
inner join big_table_2 cap2
on cap1.servicerequestid = cap2.customerreferencenumber
and big_table_1.statusid = 2

In your query, the database is probably trying to merge the results and return them and THEN limit it to the top 10 in the outer query. In the above query the database will only have to gather the first 10 results as results are being merged, saving loads of time. And if servicerequestID is indexed, it will be sure to use it. In your example, the query is looking for the servicerequestid column in a result set that has already been returned in a virtual, unindexed format.

Hope that makes sense. While hypothetically the optimizer is supposed to take whatever format we put SQL in and figure out the best way to return values every time, the truth is that the way we put our SQL together can really impact the order in which certain steps are done on the DB.

SELECT TOP is slow, regardless of ORDER BY

Why is doing a top(1) on an indexed column in SQL Server slow?

like image 149
user158017 Avatar answered Sep 21 '22 17:09

user158017


I had a similar problem with a query like yours. The query ordered but without the top clause took 1 sec, same query with top 3 took 1 minute.

I saw that using a variable for the top it worked as expected.

The code for your case:

declare @top int = 10;

select top (@top) ServiceRequestID
from 
(
    (select * 
     from  big_table_1
     where big_table_1.StatusId=2
    ) cap1
    inner join
      big_table_2 cap2
    on cap1.ServiceRequestID = cap2.CustomerReferenceNumber
    )
like image 36
Javier Suero Santos Avatar answered Sep 18 '22 17:09

Javier Suero Santos


I cant explain why but I can give an idea:

try adding SET ROWCOUNT 10 before your query. It helped me in some cases. Bear in mind that this is a scope setting so you have to set it back to its original value after running your query.

Explanation: SET ROWCOUNT: Causes SQL Server to stop processing the query after the specified number of rows are returned.

like image 32
Diego Avatar answered Sep 18 '22 17:09

Diego