Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does ORDER BY apply before or after DISTINCT?

In a MySQL query, when using the DISTINCT option, does ORDER BY apply after the duplicates are removed? If not, is there any way to make it do so? I think it's causing some issues with my code.

EDIT:
Here's some more information about what's causing my problem. I understand that, at first glance, this order would not be important, since I am dealing with duplicate rows. However, this is not entirely the case, since I am using an INNER JOIN to sort the rows.

Say I have a table of forum threads, containing this data:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
+----+--------+-------------+

I also have a set of posts in another table like this:

+----+----------+--------+---------+
| id | threadid | userid | content |
+----+----------+--------+---------+
|  1 |        1 |      1 | Lorem   |
|  2 |        1 |      2 | Ipsum   |
|  3 |        2 |      2 | Test    |
|  4 |        3 |      1 | Foo     |
|  5 |        2 |      3 | Bar     |
|  6 |        3 |      5 | Bob     |
|  7 |        1 |      2 | Joe     |
+----+----------+--------+---------+

I am using the following MySQL query to get all threads, then sort them based on the latest post (assuming that posts with higher ids are more recent:

SELECT t.*
FROM Threads t
INNER JOIN Posts p ON t.id = p.threadid
ORDER BY p.id DESC

This works, and generates something like this:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
|  1 |      1 | Information |
+----+--------+-------------+

However, as you can see, the information is correct, but there are duplicate rows. I'd like to remove such duplicates, so I used SELECT DISTINCT instead. However, this yielded the following:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
+----+--------+-------------+

This is obviously wrong, since the "Information" thread should be on top. It would seem that using DISTINCT causes the duplicates to be removed from the top to the bottom, so only the final rows are left. This causes some issues in the sorting.

Is this the case, or am I analyzing things incorrectly?

like image 680
Alexis King Avatar asked Jun 05 '12 21:06

Alexis King


1 Answers

Two things to understand:

  1. Generally speaking, resultsets are unordered unless you specify an ORDER BY clause; to the extent that you specify a non-strict order (i.e. ORDER BY over non-unique columns), the order in which records that are equal under that ordering appear within the resultset is undefined.

    I suspect you may be specifying such a non-strict order, which is the root of your problems: ensure that your ordering is strict by specifying ORDER BY over a set of columns that is sufficient to uniquely identify each record for which you care about its final position in the resultset.

  2. DISTINCT may use GROUP BY, which causes the results to be ordered by the grouped columns; that is, SELECT DISTINCT a, b, c FROM t will produce a resultset that appears as though ORDER BY a, b, c has been applied. Again, specifying a sufficiently strict order to meet your needs will override this effect.


Following your update, bearing in mind my point #2 above, it is clear that the effect of grouping the results to achieve DISTINCT makes it impossible to then order by the non-grouped column p.id; instead, you want:

SELECT   t.*
FROM     Threads t INNER JOIN Posts p ON t.id = p.threadid
GROUP BY t.id
ORDER BY MAX(p.id) DESC
like image 120
eggyal Avatar answered Sep 28 '22 00:09

eggyal