Let's assume I have a table called Cars
with 2 columns: CarName
, BrandName
Now I want to execute this query:
select CarName
from Cars
order by BrandName
As you can see, I'd like to return a list, which is sorted by a column, that is not present in the select part of the query.
The basic (not optimized) execution sequence of sql commands is: from
, where
, group by
, having
, select
, order by
.
The occuring problem is, that BrandName isn't part of what is left after the select command has been executed.
I've searched for this in books, on google and on Stackoverflow, but so far I've only found several SO comments like "I know of database system that don't allow it, but I don't remeber which one".
So my questions are:
1) What do the standards SQL-92 or SQL99 say about this.
2) Which databases allow this query and which don't?
(Background: A couple of students asked this, and I want to give them the best answer possible)
EDIT:
- Successfully tested for Microsoft SQL Server 2012
Yes, you can order by a field(s)even if it is not your in your select statement but exists in your table. For a group by clause though you'd need it to be in your select statement. There's another exception, when you're using SELECT DISTINCT you must include the fields used in the GROUP BY clause in the select list.
SQL queries initiated by using a SELECT statement support the ORDER BY clause. The result of the SELECT statement is sorted in an ascending or descending order.
We might need to sort out the result set based on a particular column value, condition etc. We can sort results in ascending or descending order with an ORDER BY clause in Select statement.
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more columns.
Your query is perfectly legal syntax, you can order by columns that are not present in the select.
If you need the full specs about legal ordering, in the SQL Standard 2003 it has a long list of statements about what the order by should and shouldn't contain, (02-Foundation, page 415, section 7.13 <Query expression>, sub part 28). This confirms that your query is legal syntax.
I think your confusion could be arising from selecting, and/or ordering by columns not present in the group by, or ordering by columns not in the select when using distinct.
Both have the same fundamental problem, and MySQL is the only one to my knowledge that allows either.
The problem is this, that when using group by or distinct, any columns not contained in either are not needed, so it doesn't matter if they have multiple different values across rows because they are never needed. Imagine this simple data set:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Z |
3 | B | Y |
If you write:
SELECT DISTINCT Column1
FROM T;
You would get
Column1
---------
A
B
If you then add ORDER BY Column2
, which of the two column2's would your use to order A by, X or Z? It is not deterministic as to how to choose a value for column2.
The same applies to selecting columns not in the group by. To simplify things just imagine the first two rows of the previous table:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Z |
In MySQL you can write
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1;
This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
Is no more or less correct than
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
So what you are saying is give me one row for each distinct value of Column1
, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY
clause to influence the results, so for example the following query:
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1
ORDER BY ID DESC;
Would ensure that you get the following result:
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
because of the ORDER BY ID DESC
, however this is not true (as demonstrated here).
The MySQL documents state:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.
The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip
15) If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.
For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):
SELECT ID, Column1, Column2
FROM T
GROUP BY ID;
Since ID is unique for each row, there can only be one value of Column1
for each ID, one value of Column2
there is no ambiguity about what to return for each row.
There's no logical reason why any RDBMS wouldn't let you do this. The usual restriction relates to SELECT DISTINCT, or the presence of a GROUP BY clause.
Current list of RDBMS known to support this:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With