Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't COUNT(DISTINCT (*)) work?

Tags:

sql

sql-server

I am still surprised why such simple query is not working:

SELECT COUNT(DISTINCT *) FROM dbo.t_test     

Where as

SELECT COUNT(DISTINCT col1) FROM dbo.t_test

and

SELECT DISTINCT * FROM dbo.t_test 

works.

What is the alternative?

EDIT:

DISTINCT * checks for uniqueness for the combined key of (col1,col2,...) and returns those rows. I expected COUNT(DISTINCT *) to just return number of such rows. Am I missing anything here?

like image 658
rkg Avatar asked Feb 15 '11 22:02

rkg


People also ask

Can we use distinct with count *?

Yes, you can use COUNT() and DISTINCT together to display the count of only distinct rows.

What does distinct * do?

The DISTINCT keyword in the SELECT clause is used to eliminate duplicate rows and display a unique list of values. In other words, the DISTINCT keyword retrieves unique values from a table.

Does count (*) ignore NULL values?

COUNT(*) returns the number of rows in the table or view. COUNT(*) counts all rows, including ones that contain duplicate column values or NULL values.

What is the difference between Count Count distinct and count (*) in SQL?

The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.


2 Answers

It doesn't work because you are only allowed to specify a single expression in COUNT(DISTINCT ...) as per the documentation:

COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

If you look carefully you can see that the allowed grammar doesn't include COUNT(DISTINCT *).

The alternative is this:

SELECT COUNT(*) FROM
(
    SELECT DISTINCT * FROM dbo.t_test 
) T1
like image 187
Mark Byers Avatar answered Sep 30 '22 10:09

Mark Byers


The truth of the matter is that SQL (Server) or any other SQL implementation is not supposed to do everything under the sun.

There are reasons to limit the SQL syntax to certain elements, from the parsing layer to query optimization to predictability of results to just common sense.

The COUNT aggregate function is normally implemented as a streaming aggregate with a gate for a single item, be it * (record count, just use a static token), or colname (increment token only when not null) or distinct colname (a hash/bucket with one key).

When you ask for COUNT(DISTINCT *) or for that matter, COUNT(DISTINCT a,b,c) - yes, it can surely be done for you if some RDBMS sees fit to implement it one day; but it is (1) uncommon enough (2) adds work to the parser (3) adds complexity to the COUNT implementation.

Mark has the correct alternative.

like image 38
RichardTheKiwi Avatar answered Sep 30 '22 10:09

RichardTheKiwi