Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: strange collision of ORDER BY and LIMIT/OFFSET

I'm trying to do this in PostgreSQL 9.1:

SELECT m.id, vm.id, vm.value
FROM m
LEFT JOIN vm ON vm.m_id = m.id and vm.variation_id = 1
ORDER BY lower(trim(vm.value)) COLLATE "C" ASC LIMIT 10 OFFSET 120

The result is:

 id |  id | value
----+-----+---------------
504 | 511 | "andr-223322"
506 | 513 | "andr-322223"
824 | 831 | "angHybrid"
866 | 873 | "Another thing"
493 | 500 | "App update required!"
837 | 844 | "App update required!"
471 | 478 | "April"
905 | 912 | "Are you sure you want to delete this thing?"
 25 |  29 | "Assignment"
196 | 201 | "AT ADDRESS"

Ok, let's execute the same query with OFFSET 130:

 id |  id | value
----+-----+---------------
196 | 201 | "AT ADDRESS"
256 | 261 | "Att Angle"
190 | 195 | "Att Angle"
273 | 278 | "Att Angle:"
830 | 837 | "attAngle"
475 | 482 | "August"
710 | 717 | "Averages"
411 | 416 | "AVG"
692 | 699 | "AVG SHAPE"
410 | 415 | "AVGs"

and we see our AT ADDRESS item again, but at the beginning!!!

The fact is that the vm table contains two following items:

 id | m_id | value
----+------+---------------
201 |  196 | "AT ADDRESS"
599 |  592 | "At Address"

I cure this situation with a workaround:

(lower(trim(vm.value)) || vm.id)

but What The Hell ???!!! Why do I have to use a workaround?

like image 261
Paul Avatar asked Mar 01 '13 16:03

Paul


1 Answers

Swearing won't change the SQL standard that defines this behaviour.
The order of rows is undefined unless specified in ORDER BY. Per documentation:

If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.

Since you didn't define an order for these two peers (in your sort order):

 id | m_id | value
----+------+---------------
201 |  196 | "AT ADDRESS"
599 |  592 | "At Address"

.. you get arbitrary ordering - whatever is convenient for Postgres. A query with LIMIT often uses a different query plan, which can explain different results.

Fix:

ORDER BY lower(trim(vm.value)) COLLATE "C", vm.id;

Or (maybe more meaningful - possibly also tune to existing indexes):

ORDER BY lower(trim(vm.value)) COLLATE "C", vm.value, vm.id;

(This is unrelated to the use of COLLATE "C" here, btw.)
Don't concatenate for this purpose, that's much more expensive and potentially makes it impossible to use an index (unless you have an index on that precise expression). Add another expression that kicks in when prior expressions in the ORDER BY list leave ambiguity.

Also, since you have a LEFT JOIN there, rows in m without match in vm have null values for all current ORDER BY expressions. They come last and are sorted arbitrarily otherwise. If you want a stable sort order overall, you need to deal with that, too. Like:

ORDER BY lower(trim(vm.value)) COLLATE "C", vm.id, m.id;

Asides

Why store the double quotes? Seems to be costly noise. You might be better off without them. You can always add the quotes on output if need be.

Many clients cannot deal with the same column name multiple times in one result. You need a column alias for at least one of your id columns: SELECT m.id AS m_id, vm.id AS vm_id .... Goes to show why "id" for a column is an anti-pattern to begin with.

like image 150
Erwin Brandstetter Avatar answered Sep 20 '22 23:09

Erwin Brandstetter