I am aware of select count(distinct a)
, but I recently came across select distinct count(a)
.
I'm not very sure if that is even valid.
If it is a valid use, could you give me a sample code with a sample data, that would explain me the difference.
Hive doesn't allow the latter.
Any leads would be appreciated!
Query select count(distinct a)
will give you number of unique values in a.
While query select distinct count(a)
will give you list of unique counts of values in a. Without grouping it will be just one line with total count.
See following example
create table t(a int)
insert into t values (1),(2),(3),(3)
select count (distinct a) from t
select distinct count (a) from t
group by a
It will give you 3
for first query and values 1
and 2
for second query.
I cannot think of any useful situation where you would want to use:
select distinct count(a)
If the query has no group by
, then the distinct
is anomalous. The query only returns on row anyway. If there is a group by
, then the aggregation columns should be in the select
, to identify each row.
I mean, technically, with a group by
, it would be answering the question: "how many different non-null values of a
are in groups". Usually, it is much more useful to know the value per group.
If you want to count the number of distinct values of a
, then use count(distinct a)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With