Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which DBMS's allow an order by of an attribute, that is not present in the select clause?

Let's assume I have a table called Cars with 2 columns: CarName, BrandName

Now I want to execute this query:

select CarName
from Cars
order by BrandName

As you can see, I'd like to return a list, which is sorted by a column, that is not present in the select part of the query.

The basic (not optimized) execution sequence of sql commands is: from, where, group by, having, select, order by.

The occuring problem is, that BrandName isn't part of what is left after the select command has been executed.

I've searched for this in books, on google and on Stackoverflow, but so far I've only found several SO comments like "I know of database system that don't allow it, but I don't remeber which one".

So my questions are:
1) What do the standards SQL-92 or SQL99 say about this.
2) Which databases allow this query and which don't?

(Background: A couple of students asked this, and I want to give them the best answer possible)

EDIT:
- Successfully tested for Microsoft SQL Server 2012

like image 779
citronas Avatar asked Dec 03 '13 16:12

citronas


People also ask

Can you ORDER BY something not in a SELECT statement?

Yes, you can order by a field(s)even if it is not your in your select statement but exists in your table. For a group by clause though you'd need it to be in your select statement. There's another exception, when you're using SELECT DISTINCT you must include the fields used in the GROUP BY clause in the select list.

Can ORDER BY be used in SELECT statement?

SQL queries initiated by using a SELECT statement support the ORDER BY clause. The result of the SELECT statement is sorted in an ascending or descending order.

Which clause is used to order the list retrieved from the SELECT statement?

We might need to sort out the result set based on a particular column value, condition etc. We can sort results in ascending or descending order with an ORDER BY clause in Select statement.

Which clause is used to arrange the data in any particular order in SQL?

The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more columns.


2 Answers

Your query is perfectly legal syntax, you can order by columns that are not present in the select.

  • Working Demo with MySQL
  • Working Demo with SQL Server
  • Working Demo with Postgresql
  • Working Demo with SQLite
  • Working Demo with Oracle

If you need the full specs about legal ordering, in the SQL Standard 2003 it has a long list of statements about what the order by should and shouldn't contain, (02-Foundation, page 415, section 7.13 <Query expression>, sub part 28). This confirms that your query is legal syntax.

I think your confusion could be arising from selecting, and/or ordering by columns not present in the group by, or ordering by columns not in the select when using distinct.

Both have the same fundamental problem, and MySQL is the only one to my knowledge that allows either.

The problem is this, that when using group by or distinct, any columns not contained in either are not needed, so it doesn't matter if they have multiple different values across rows because they are never needed. Imagine this simple data set:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |
2   |    A    |    Z     |
3   |    B    |    Y     |

If you write:

SELECT  DISTINCT Column1
FROM    T;

You would get

 Column1 
---------
     A   
     B   

If you then add ORDER BY Column2, which of the two column2's would your use to order A by, X or Z? It is not deterministic as to how to choose a value for column2.

The same applies to selecting columns not in the group by. To simplify things just imagine the first two rows of the previous table:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |
2   |    A    |    Z     |

In MySQL you can write

SELECT  ID, Column1, Column2
FROM    T
GROUP BY Column1;

This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |

Is no more or less correct than

ID  | Column1 | Column2  |  
----|---------+----------|
2   |    A    |    Y     |

So what you are saying is give me one row for each distinct value of Column1, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY clause to influence the results, so for example the following query:

SELECT  ID, Column1, Column2
FROM    T
GROUP BY Column1
ORDER BY ID DESC;

Would ensure that you get the following result:

ID  | Column1 | Column2  |  
----|---------+----------|
2   |    A    |    Y     |

because of the ORDER BY ID DESC, however this is not true (as demonstrated here).

The MySQL documents state:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.

The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip

15) If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.

For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):

SELECT  ID, Column1, Column2
FROM    T
GROUP BY ID;

Since ID is unique for each row, there can only be one value of Column1 for each ID, one value of Column2 there is no ambiguity about what to return for each row.

like image 196
GarethD Avatar answered Sep 19 '22 19:09

GarethD


There's no logical reason why any RDBMS wouldn't let you do this. The usual restriction relates to SELECT DISTINCT, or the presence of a GROUP BY clause.

Current list of RDBMS known to support this:

  • Microsoft SQL Server 2012
  • Oracle
  • PostgreSQL
  • MySQL
  • DB2
like image 37
3 revs, 3 users 38% Avatar answered Sep 16 '22 19:09

3 revs, 3 users 38%