I have this query (in postgresql):
SELECT "table_1".* FROM "table_1"
INNER JOIN "join_table"
ON "table_1"."id" = "join_table"."table_1_id"
WHERE "join_table"."table_2_id" = 650727
ORDER BY table_1.created_at DESC
LIMIT 1
Which returns 1 result, but is taking ~250-300 ms to execute
There are btree indexes on table_1.created_at
, as well as join_table.table_1_id
and join_table.table_2_id
When I ONLY remove the LIMIT 1
from the query, the execution time drops down to ~13ms. This specific query currently only returns one result (without the LIMIT), but there are others with a different value in the WHERE that may return more (which is why the LIMIT is necessary).
Why is adding a LIMIT to a query that is already only returning a single result bloating the execution time so much?
Here is the explain plan with the LIMIT 1
(these are always hard for me to fully understand...): http://explain.depesz.com/s/rOy
And here is the explain plan without LIMIT 1: http://explain.depesz.com/s/q3d7
Additionally, if I keep the LIMIT 1
, but change the order by to ASC
, the query goes down to 13 ms also. And if I change the LIMIT
to LIMIT 20
(But keep the ORDER BY DESC
) it only takes 22ms... wtf!?
So it has something to do with the combination of ORDER BY DESC
, and LIMIT 1
(Exactly 1)
Ok, this is a pretty classic case.
Whenever you use LIMIT
(or the like such as FETCH FIRST ... ROWS ONLY
) the optimizer attempts to optimize the query so that fetching only the first rows(s) is as fast as possible. That means that the optimizer has a preference for execution plans where the first cost value is low, not the second one shown in the execution plan. Remember: the two cost values shown by PostgreSQL (e.g., cost=48.150..6,416.240
are the setup cost (48.150) and the total execution cost (6,416.240).
The "problem" here is that you have an index which supports your ORDER BY
clause. So, PostgreSQL thinks that it can just go through this index (in reverse order due to the DESC
modifier in your query) and check for each row in the other table whether it satisfies the other WHERE
clause or not. The problem is that the optimizer has no way of knowing whether that will be one of the very first rows or rather one at the end (according to the ORDER BY
). The optimizer does an arbitrary guess an believes the matching row will be more towards the begin than the end. This optimistic estimate is then used to calculate the cost value which turns out to be too optimistic so that PostgreSQL finally settles down on a bad execution plan.
When you change the ORDER BY ... DESC
to ORDER BY ... ASC
the optimizer does the same arbitrary but optimistic estimate which turns out to be more correct in that case, hence you get better execution time.
However, from optimizations perspective, the root cause is that the optimizer estimates that 2,491 rows will match the WHERE
clause tango = 650727
. When the optimizer would correctly estimate that this just hits a few rows, then the problem would likely not occur.
The WHERE
clause is sufficiently trivial that a good estimate should be no problem. So, the main question is: how about your statistics on that table?
There are several ways to cope with this problem:
ANALYZE
) and see if that alone helps.ALTER TABLE ... SET STATISTICS
). This also increases the sample size used to gather the statistics which means ANALYZE
takes longer but yields more accurate results.In theory, this should be enough to fix that problem. However, other options are:
created_at
for other reasons (like other queries), get rid of it.ORDER BY
clause uses the same table as the WHERE
clause: if you are lucky, you might have a column in join_table
that has the same order as table_1.created_at
so that it does not make any difference on which you order. However, be careful, this is easy to get wrong (e.g., sequential numbers filled by sequences might have outliners).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With