Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count(*) vs Count(id) speed

Tags:

sql

sql-server

I know they return different results (the first counts nulls, the latter not). That's not my question. Just imagine a case where I don't care (either because there are no nulls, or because there are only a few and I only want a general sense of the number of rows in the database).

My question is about the following (presumable) contradiction:

Here one of the highest rep users in the SQL tag says

Your use of COUNT(*) or COUNT(column) should be based on the desired output only.

On the other hand, here is a 47 times upvoted comment saying

... if you have a non-nullable column such as ID, then count(ID) will significantly improve performance over count(*).

The two seem to contradict each other. So can someone please explain to me why is whatever the correct one correct?

like image 968
ispiro Avatar asked Sep 17 '18 20:09

ispiro


2 Answers

Expanding on scsimon's answer, I ran a test of 10 million rows on SQL Server 2017. Performance results are at the bottom.

EDIT

With suggestions from Jeroen's comment, I added more columns to the test table: an int identity, a bigint not null, and a nullable tinyint. In each case, the select count query chose a scan of the index on the tinyint column. The estimated cost was identical between the queries, but the select count(NullableColumn) was noticeably slower than other select count methods.

Code

if object_id('tempdb..#table') is null
begin
    declare @Count bigint = 10000000

    create table #Table
    (
        id int identity primary key clustered,
        BigID bigint not null,
        TinyC tinyint null
    );

    WITH
        E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
        E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
        E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
        E16(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+16 or 10,000,000,000,000,000 rows max
        cteTally(N) AS 
        (
            SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E16
        )
    insert into #table (BigId, TinyC)
    select N, case when N % 2 = 0 then null else N % 8 end
    from cteTally
    where N <= @Count;

    create unique index IX_BigID on #table (BigID);
    create index IX_TinyC on #table (TinyC);
end

set statistics io on
set statistics time on

print 'count(*)'
select count(*) from #table
option (maxdop 1)

print 'count(ID)'
select count(ID) from #table
option (maxdop 1)

print 'count(BigID)'
select count(BigID) from #table
option (maxdop 1)

print 'count(TinyC)'
select count(TinyC) from #table
option (maxdop 1)

set statistics io off
set statistics time off

--drop table #table

Performance

count(*)

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 735 ms, elapsed time = 746 ms

count(ID) -- int identity primary key clustered

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 765 ms, elapsed time = 776 ms

count(BigID) -- bigint not null, indexed

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 735 ms, elapsed time = 731 ms

count(TinyC) -- tinyint nullable, indexed

Warning: Null value is eliminated by an aggregate or other SET operation.

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 1593 ms, elapsed time = 1584 ms.

like image 81
Paul Williams Avatar answered Nov 03 '22 11:11

Paul Williams


I would expect count(column) to be a bit slower, but that wasn't the case in a quick test (below) but what's more important is the first link you posted... since count(*) and count(column) can produce different results based on the column being null-able. Also, I'm assuming you aren't returning any other columns and thus removing tests which would be unique to your environment and indexes.

WITH
    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS 
    (
        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    )
select  N
into #table
from cteTally


update #table
set N = null
where N % 2 = 0

select count(*) from #table
select count(N) from #table

enter image description here

Similarly, on a table with 361,912 rows, here are the results for count(*), count(pk_column), and count(nullable_column) which isn't part of an index:

enter image description here

like image 32
S3S Avatar answered Nov 03 '22 11:11

S3S