I am going through the pain of normalising a horrible legacy database, and have discovered what I think is a bug with the DBMS.
This query returns the results as I expect:
SELECT DISTINCT RIGHT(SQUEEZE(thing_id), 2) AS thing_id, TRIM(thing_name)
FROM thing
ORDER BY thing_id, thing_name;
(16 rows)
The first time I ran the query, I had inadvertently used the wrong columns in the ORDER BY, as below:
SELECT DISTINCT RIGHT(SQUEEZE(thing_id), 2) AS thing_id, TRIM(thing_name)
FROM thing
ORDER BY thing_name, location;
(33 rows)
Note that the only thing to change is the ORDER BY, and the number of rows returned increases from 16 to 33. The results it gives are not DISTINCT as the query specifies.
I believe this is an outright bug, but a coworker says that it is normal because when we order by "location", it is selected an included invisibly in the results.
Should the ORDER BY ever affect the number of rows returned in a SELECT query?
EDIT: I had another person look at the queries AND I copied the queries to two seperate files, then ran a diff command on them. It is 100% certain that the only difference between the two queries is the columns listed in the ORDER BY.
UPDATE: Ingres have since release patch 14301 with bugfix: "Bug 126640 (GENERIC) Query with order-by expression and distinct aggregate returns more rows than expected. The columns in the order-by expression are not in the select list."
i.e. The query in question will now result in an error as the results are not correct.
The issue I see is that the second query has a column (location
) in the ORDER BY
that is not included in the SELECT DISTINCT
list. Actually both of the queries are invalid SQL (despite that Ingres seems to allow them). I simplified them (so the first is ok) to:
Query one (valid SQL):
SELECT DISTINCT
thing_id
, thing_name
FROM thing
ORDER BY thing_id
, thing_name ;
Query two (invalid SQL, should produce error):
SELECT DISTINCT
thing_id
, thing_name
FROM thing
ORDER BY thing_name
, location;
Why should it give error? Because the ORDER BY
should be processed after SELECT
and DISTINCT
. So, two or more rows in the original table may have same thing_id
and thing_name
but different location
. These rows will be collapsed into one. So, there is no location value to be useed for the ordering. Even if there was (a hidden location valiue) kept, which one of the many should it be?
SELECT DISTINCT
queries can be rewritten with SELECT ALL
and GROUP BY
(also invalid in this case):
SELECT ALL
thing_id
, thing_name
FROM thing
GROUP BY thing_id
, thing_name
ORDER BY thing_name
, location;
The above (query 2) actually does produce error in PostgreSQL, SQL-Server and Oracle. Test it in SQL-Fiddle
From the erroneous number of rows returned by the second query in Ingres, I guess that what happens behind the scenes is that he location
is secretly kept in the SELECT
list so it can be used for the ORDER BY
and then removed. This, in combination with the DISTINCT
results in the non-standard buggy behaviour:
SELECT DISTINCT
thing_id
, thing_name
(, location --- hidden column)
FROM thing
ORDER BY thing_name
, location;
You can call it either a bug or a feature, it doesn't matter, as long as you know it should really not be allowed in the first place.
It seems that a similar issue has been reported a year ago in Actian forum: Problem with DISTINCT + ORDER BY and supposedly fixed. No idea which version they are refering to or if it has been actually fixed or not (and what "fix" means).
If you want the query to be valid and behave as you expect, you could use somethign like this:
SELECT
RIGHT(SQUEEZE(thing_id), 2) AS squeezed_thing_id
, TRIM(thing_name) AS trimmed_thing_name
, MIN(location) AS a_location --- or MAX()
FROM
thing
GROUP BY
RIGHT(SQUEEZE(thing_id), 2)
, TRIM(thing_name)
ORDER BY
trimmed_thing_name
, a_location ;
Since the order of operations in SQL is:
FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY
this does seem to be a bug indeed. Which DBMS is it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With