Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

People also ask

Why is GROUP BY faster than distinct?

DISTINCT would usually be faster than GROUP BY if a) there's no index on that column and b) you are not ordering as well since GROUP BY does both filtering and ordering.

Is distinct or GROUP BY more efficient?

In summary: GROUP BY is slightly faster than SELECT DISTINCT.

Should we use distinct or GROUP BY?

When and where to use GROUP BY and DISTINCT. DISTINCT is used to filter unique records out of the records that satisfy the query criteria. The "GROUP BY" clause is used when you need to group the data and it should be used to apply aggregate operators to each group.

Is select faster than select distinct?

Most of the SELECT DISTINCT queries will perform exactly as fast as their simple SELECT counterparts, because the optimizer will do away with the step necessary for eliminating duplicates.


They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).

If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.

When in doubt, test!


If you have an index on profession, these two are synonyms.

If you don't, then use DISTINCT.

GROUP BY in MySQL sorts results. You can even do:

SELECT u.profession FROM users u GROUP BY u.profession DESC

and get your professions sorted in DESC order.

DISTINCT creates a temporary table and uses it for storing duplicates. GROUP BY does the same, but sortes the distinct results afterwards.

So

SELECT DISTINCT u.profession FROM users u

is faster, if you don't have an index on profession.


All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column. Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...

BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.

So if you have something like:

// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins

// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id

It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.

So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!


Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!


well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).

tested example:

postgres=# select count(*) from (select distinct i from g) a;

count 

10001
(1 row)

Time: 1563,109 ms

postgres=# select count(*) from (select i from g group by i) a;

count
10001
(1 row)

Time: 594,481 ms

http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I

so be careful ... :)


Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null ..

SELECT DISTINCT u.profession FROM users u

is equal to

SELECT u.profession FROM users u GROUP BY u.profession order by null

It seems that the queries are not exactly the same. At least for MySQL.

Compare:

  1. describe select distinct productname from northwind.products
  2. describe select productname from northwind.products group by productname

The second query gives additionally "Using filesort" in Extra.