sql group by versus distinct

Tags:

Why would someone use a group by versus distinct when there are no aggregations done in the query?

Also, does someone know the group by versus distinct performance considerations in MySQL and SQL Server. I'm guessing that SQL Server has a better optimizer and they might be close to equivalent there, but in MySQL, I expect a significant performance advantage to distinct.

I'm interested in dba answers.

EDIT:

Bill's post is interesting, but not applicable. Let me be more specific...

select a, b, c  from table x group by a, b,c

versus

select distinct a,b,c from table x

691

asked Jan 09 '09 01:01

mson

2 Answers

GROUP BY maps groups of rows to one row, per distinct value in specific columns, which don't even necessarily have to be in the select-list.

SELECT b, c, d FROM table1 GROUP BY a;

This query is legal SQL (correction: only in MySQL; actually it's not standard SQL and not supported by other brands). MySQL accepts it, and it trusts that you know what you're doing, selecting b, c, and d in an unambiguous way because they're functional dependencies of a.

However, Microsoft SQL Server and other brands don't allow this query, because it can't determine the functional dependencies easily. edit: Instead, standard SQL requires you to follow the Single-Value Rule, i.e. every column in the select-list must either be named in the GROUP BY clause or else be an argument to a set function.

Whereas DISTINCT always looks at all columns in the select-list, and only those columns. It's a common misconception that DISTINCT allows you to specify the columns:

SELECT DISTINCT(a), b, c FROM table1;

Despite the parentheses making DISTINCT look like function call, it is not. It's a query option and a distinct value in any of the three fields of the select-list will lead to a distinct row in the query result. One of the expressions in this select-list has parentheses around it, but this won't affect the result.

130

answered Sep 24 '22 11:09

Bill Karwin

A little (VERY little) empirical data from MS SQL Server, on a couple of random tables from our DB.

For the pattern:

SELECT col1, col2 FROM table GROUP BY col1, col2

and

SELECT DISTINCT col1, col2 FROM table

When there's no covering index for the query, both ways produced the following query plan:

|--Sort(DISTINCT ORDER BY:([table].[col1] ASC, [table].[col2] ASC))    |--Clustered Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]))

and when there was a covering index, both produced:

|--Stream Aggregate(GROUP BY:([table].[col1], [table].[col2]))    |--Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]), ORDERED FORWARD)

so from that very small sample SQL Server certainly treats both the same.

answered Sep 22 '22 11:09

Cowan

Related questions
                            
                                How do I load a sql.gz file to my database? (importing)
                            
                                How to refresh datagrid in WPF
                            
                                ERROR 2003 (HY000): Can't connect to MySQL server on localhost (10061)
                            
                                mysql sort string number
                            
                                Ebean - Dynamic Query - Prepared Statement's Mismatched Parameter Count Error
                            
                                Emulate MySQL LIMIT clause in Microsoft SQL Server 2000
                            
                                Maximum MySQL user password length
                            
                                Is there a more efficient way of making pagination in Hibernate than executing select and count queries?
                            
                                How many rows will be locked by SELECT ... ORDER BY xxx LIMIT 1 FOR UPDATE?
                            
                                How do I get the ID of multiple inserted rows in MySQL?
                            
                                What communication protocol does MySQL use?
                            
                                Converting SELECT DISTINCT ON queries from Postgresql to MySQL
                            
                                Do index names have to be unique across entire database in Mysql?
                            
                                MySQL online testing tool [closed]
                            
                                What type would you map BigDecimal in Java/Hibernate in MySQL?
                            
                                What is the meaning of grave accent (AKA backtick) quoted characters in MySQL?
                            
                                Join tables from two different server
                            
                                GROUP BY behavior when no aggregate functions are present in the SELECT clause
                            
                                Why should I ever choose any other length than 255 for varchar in MySQL?
                            
                                Is it possible to reference one column as multiple foreign keys?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sql group by versus distinct

Tags:

performance

sql-server

mysql

group-by

distinct

mson

People also ask

2 Answers

Bill Karwin

Cowan

Recent Activity

Donate For Us