Is there any purpose for using both DISTINCT and GROUP BY in SQL?
Below is a sample code
SELECT DISTINCT Actors FROM MovieDetails GROUP BY Actors
Does anyone know of any situations where both DISTINCT and GROUP BY need to be used, to get any specific desired results?
(The general usage of DISTINCT and GROUP BY separately is understood)
The GROUP BY clause collects the rows into sets so that each row in each set has the same customer numbers. With no other columns selected, the result is a list of the unique customer_num values.
DISTINCT is used to filter unique records out of all records in the table. It removes the duplicate rows. SELECT DISTINCT will always be the same, or faster than a GROUP BY.
Distinct is used to find unique/distinct records where as a group by is used to group a selected set of rows into summary rows by one or more columns or an expression. The functional difference is thus obvious. The group by can also be used to find distinct values as shown in below query.
GROUP BY lets you use aggregate functions, like AVG , MAX , MIN , SUM , and COUNT . On the other hand DISTINCT just removes duplicates. This will give you one row per department, containing the department name and the sum of all of the amount values in all rows for that department.
DISTINCT
to remove duplicate GROUPING SETS
from the GROUP BY
clauseIn a completely silly example using GROUPING SETS()
in general (or the special grouping sets ROLLUP()
or CUBE()
in particular), you could use DISTINCT
in order to remove the duplicate values produced by the grouping sets again:
SELECT DISTINCT actors FROM (VALUES('a'), ('a'), ('b'), ('b')) t(actors) GROUP BY CUBE(actors, actors)
With DISTINCT
:
actors ------ NULL a b
Without DISTINCT
:
actors ------ a b NULL a b a b
But why, apart from making an academic point, would you do that?
DISTINCT
to find unique aggregate function valuesIn a less far-fetched example, you might be interested in the DISTINCT
aggregated values, such as, how many different duplicate numbers of actors are there?
SELECT DISTINCT COUNT(*) FROM (VALUES('a'), ('a'), ('b'), ('b')) t(actors) GROUP BY actors
Answer:
count ----- 2
DISTINCT
to remove duplicates with more than one GROUP BY
columnAnother case, of course, is this one:
SELECT DISTINCT actors, COUNT(*) FROM (VALUES('a', 1), ('a', 1), ('b', 1), ('b', 2)) t(actors, id) GROUP BY actors, id
With DISTINCT
:
actors count ------------- a 2 b 1
Without DISTINCT
:
actors count ------------- a 2 b 1 b 1
For more details, I've written some blog posts, e.g. about GROUPING SETS
and how they influence the GROUP BY
operation, or about the logical order of SQL operations (as opposed to the lexical order of operations).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With