Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TSQL - What is the fastest way to check for more than one record?

Sometimes I need to check if at least one record is present, usually I use a:

IF EXISTS (SELECT TOP 1 1 FROM [SomeTable] WHERE [Fields] = [Values]) BEGIN
-- action
END

Is there a fast way to check if more than one record is present? I could do something like:

IF EXISTS (SELECT 1 FROM [SomeTable] 
                        WHERE [Fields] = [Values] 
                                HAVING Count(*) > 1) 
BEGIN
    -- action
END

But I'm not sure if it is the fastest way of doing this as it will test all the records in the set. Is there a faster way?

The 'where' part can be quite complex and could consist of multiple ANDs and ORs.

like image 926
Kees C. Bakker Avatar asked Dec 18 '12 12:12

Kees C. Bakker


People also ask

What is the most performant way to get the total number of records from a table?

With the help of the SQL count statement, you can get the number of records stored in a table.

How do I find the second top record in SQL?

Method-1: Syntax: SELECT MAX (column_name) FROM table_name WHERE column_name NOT IN (SELECT Max (column_name) FROM table_name);


2 Answers

SQL Server does not generally short circuit aggregate queries. Sometimes it can transform a HAVING COUNT(*) > 0 query to use the same plan as EXISTS (discussed in the comments here) but that's as far as it goes.

A HAVING COUNT(*) > 1 query will always count all rows even though in theory it could stop counting after row no 2.

With that in mind I would use

IF EXISTS(
  SELECT * FROM (
                 SELECT TOP 2 *
                 FROM [SomeTable] 
                 WHERE [Fields] = [Values] 
) T
HAVING COUNT(*)=2) 

The TOP 2 iterator will stop requesting rows after the second one is returned and thus allow the inner query to shortcircuit early rather than returning them all and counting them.

Example plans for both versions are below

Plans

Regarding the question in the comments about

"How can you tell which one is best? Is it the query cost?"

In the particular case shown in the plans above cost would be a reasonable indication as the estimated and actual row counts are quite accurate and the two plans are very similar except for the addition of the TOP iterator. So the additional cost shown in the plan is entirely a representation of the fact that additional number of rows need to be scanned (and possibly read in from disc) and counted.

It is quite clear cut in this case that this just represents additional work. In other plans it may not be. The addition of the TOP 2 may change the query tree underneath it significantly (e.g. disfavouring plans with blocking iterators)

In that case the cost shown in execution plans may not a reliable metric. Even in actual execution plans the cost shown is based on estimates so is only as good as those are and even if the estimated row counts are good the costs shown are still just based on certain modelling assumptions.

SQL Kiwi puts it well in this recent answer on the DBA site

optimizer cost estimates are mainly only useful for internal server purposes. They are not intended to be used to assess potential performance, even at a 'high level'. The model is an abstraction that happens to work reasonably well for the internal purposes it was designed for. The chances that estimated costs bear any sensible resemblance to real execution costs on your hardware and configuration is very small indeed.

Choose other metrics to compare performance, based on whatever real issues are important to you.

logical reads (shown when SET STATISTICS IO ON;) are one such metric that can be looked at but again focusing on this exclusively can be misleading. Testing query duration is probably the only reliable way but even that is not an exact science as performance can vary dependent upon concurrent activity on the server (waits for memory grants, DOP available, number of relevant pages in the cache).

In the end it just comes down to getting a query plan that looks to be an efficient use of the resources on your server.

like image 176
Martin Smith Avatar answered Sep 24 '22 21:09

Martin Smith


I'm sure there are tricks that'll enable you to perform this check faster - although it'll depend very much upon your schema (especially indexes), and a particular check may work for one situation and not for another.

Something like the below might work for you.

IF EXISTS (SELECT * FROM [SomeTable] T1
           INNER JOIN [SomeTable] T2
           ON T1.UniqueID <> T2.UniqueID
           WHERE T1.[Fields] = T1.[Values]
           AND T2.[Fields] = T2.[Values]) 
BEGIN
    -- action
END
like image 25
Will A Avatar answered Sep 24 '22 21:09

Will A