I am still surprised why such simple query is not working:
SELECT COUNT(DISTINCT *) FROM dbo.t_test
Where as
SELECT COUNT(DISTINCT col1) FROM dbo.t_test
and
SELECT DISTINCT * FROM dbo.t_test
works.
What is the alternative?
EDIT:
DISTINCT *
checks for uniqueness for the combined key of (col1,col2,...) and returns those rows. I expected COUNT(DISTINCT *) to just return number of such rows. Am I missing anything here?
Yes, you can use COUNT() and DISTINCT together to display the count of only distinct rows.
The DISTINCT keyword in the SELECT clause is used to eliminate duplicate rows and display a unique list of values. In other words, the DISTINCT keyword retrieves unique values from a table.
COUNT(*) returns the number of rows in the table or view. COUNT(*) counts all rows, including ones that contain duplicate column values or NULL values.
The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.
It doesn't work because you are only allowed to specify a single expression in COUNT(DISTINCT ...)
as per the documentation:
COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )
If you look carefully you can see that the allowed grammar doesn't include COUNT(DISTINCT *)
.
The alternative is this:
SELECT COUNT(*) FROM
(
SELECT DISTINCT * FROM dbo.t_test
) T1
The truth of the matter is that SQL (Server) or any other SQL implementation is not supposed to do everything under the sun.
There are reasons to limit the SQL syntax to certain elements, from the parsing layer to query optimization to predictability of results to just common sense.
The COUNT aggregate function is normally implemented as a streaming aggregate with a gate for a single item, be it *
(record count, just use a static token), or colname
(increment token only when not null) or distinct colname
(a hash/bucket with one key).
When you ask for COUNT(DISTINCT *)
or for that matter, COUNT(DISTINCT a,b,c)
- yes, it can surely be done for you if some RDBMS sees fit to implement it one day; but it is (1) uncommon enough (2) adds work to the parser (3) adds complexity to the COUNT implementation.
Mark has the correct alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With