I'm running a query like this in MSSQL2008:
select count(*)
from t1
inner join t2 on t1.id = t2.t1_id
inner join t3 on t1.id = t3.t1_id
Assume t1.id
has a NOT NULL
constraint. Since they're inner joins and t1.id
can never be null, using count(t1.id)
instead of count(*)
should produce the exact same end result. My question is: Would the performance be the same?
I'm also wondering whether the joins could affect this. I realize that adding or removing a join will affect both performance and the length of the result set. Suppose that without changing the join pattern, you set count
to target only one table. Would it make any difference? In other words, is there a difference between these two queries:
select count(*) from t1 inner join t2 on t1.id = t2.t1_id
select count(t1.*) from t1 inner join t2 on t1.id = t2.t1_id
COUNT(id) vs. COUNT(*) in MySQL answers this question for MySQL, but I couldn't find answers for MS-SQL specifically, and I can't find anything at all that takes the join
factor into account.
NOTE: I tried to find this information on both Google and SO, but it was difficult to figure out how to word my search.
The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.
Select is equally efficient (in terms of velocity) if you use * or columns. The difference is about memory, not velocity.
According to this theory, COUNT(*) takes all columns to count rows and COUNT(1) counts using the first column: Primary Key. Thanks to that, COUNT(1) is able to use index to count rows and it's much faster.
So to make SELECT COUNT(*) queries fast, here's what to do: Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.
I tried a few SELECT COUNT(*) FROM MyTable
vs. SELECT COUNT(SomeColumn) FROM MyTable
with various sizes of tables, and where the SomeColumn
once is a clustering key column, once it's in a non-clustered index, and once it's in no index at all.
In all cases, with all sizes of tables (from 300'000 rows to 170 million rows), I never see any difference in terms of either speed nor execution plan - in all cases, the COUNT
is handled by doing a clustered index scan --> i.e. scanning the whole table, basically. If there is a non-clustered index involved, then the scan is on that index - even when doing a SELECT COUNT(*)
!
There doesn't seem to be any difference in terms of speed or approach how those things are counted - to count them all, SQL Server just needs to scan the whole table - period.
Tests were done on SQL Server 2008 R2 Developer Edition
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With