Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JOIN, GROUP BY, ORDER BY

The problem I first had with the following query was that the group by clause was performed before the order by:

The saved.recipe_id column is an integer generated by UNIX_TIMESTAMP()

SELECT
    saved.recipe_id,
    saved.`date`,
    user.user_id
FROM saved
    JOIN user
        ON user.id = saved.user_id
GROUP BY saved.recipe_id
ORDER BY saved.`date` DESC

So I tried all sorts of different possible solution with sub queries and other bs. In the end I ended up with trying out some different sub queries in the join clause witch required me to change the table order from the from clause to the join clause. I decided to just try the following out:

SELECT
    saved.recipe_id,
    saved.`date`,
    user.user_id
FROM user
    JOIN saved
        ON user.id = saved.user_id
GROUP BY saved.recipe_id
ORDER BY saved.`date` DESC

For some reason this seems to order correctly, but why?
How can this change make my query sort more correctly then before?
Does it really? or is it just happen to do it for the test cases I put it up against?

like image 848
superhero Avatar asked Sep 12 '13 11:09

superhero


People also ask

Can I use GROUP BY and ORDER BY together?

Using Group By and Order By Together When combining the Group By and Order By clauses, it is important to bear in mind that, in terms of placement within a SELECT statement: The GROUP BY clause is placed after the WHERE clause. The GROUP BY clause is placed before the ORDER BY clause.

Does GROUP BY do ORDER BY?

group by does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary. So add the order by if you need a guaranteed order.

Is GROUP BY before or after join?

Using Group By with Inner Join Group by works conventionally with Inner Join on the final result returned after joining two or more tables. If you are not familiar with Group by clause in SQL, I would suggest going through this to have a quick understanding of this concept.


2 Answers

So the problem I first had with the following query was that the group by clause was performed before the order by:

This is not a problem. This is how SQL is defined and how it operates. The group by creates a new set of rows and order by orders those rows.

There is no ordering issue here. There is an "understanding of SQL" issue. Your order by is only ordering the results of the query. These results are produced by the group by, and the order o fthe joins has nothing to do with the results.

You are using a MySQL extension called Hidden Columns. This is when you have an aggregation query that has columns in the select (or having or order by clauses) that are not part of aggregation functions (sum(), etc) or part of the group by. Here is a quote from the documentation:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

Presumably, you want the most recent date and user associated with that. The following query does what you want correctly and consistently:

SELECT saved.recipe_id, max(saved.`date`) as MostRecentDate,
       substring_index(group_concat(user.user_id), ',', 1) as MostRecentUser
FROM user JOIN
     saved
     ON user.id = saved.user_id
GROUP BY saved.recipe_id
ORDER BY max(saved.`date`) DESC;
like image 151
Gordon Linoff Avatar answered Oct 02 '22 16:10

Gordon Linoff


From what I remember GROUP BY is always performed before ORDER BY. If you retrieve any column which is not in any aggregation function the result for that column will be random. Your correct order from second query is accidental.

Instead saved.date use MAX(saved.date)

Then you will get determined order from every singe group and then you will order that determined results.

like image 37
MKB Avatar answered Oct 02 '22 15:10

MKB