Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extremely slow PostgreSQL query with ORDER and LIMIT clauses

I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:

SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; 

This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the ORDER BY clause or the LIMIT clause, it runs almost instantaneously.

I'm assuming that the presence of both ORDER BY and LIMIT are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?

In case it helps, here is the EXPLAIN for all 3 cases:

//////// Both ORDER and LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0;                                                      QUERY PLAN                                                      --------------------------------------------------------------------------------------------------------------------  Limit  (cost=0.00..16663.44 rows=5 width=663)    ->  Nested Loop  (cost=0.00..25355084.05 rows=7608 width=663)          Join Filter: (foos.bar_id = bars.id)          ->  Index Scan Backward using foos_pkey on foos  (cost=0.00..11804133.33 rows=4963477 width=663)                Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))          ->  Materialize  (cost=0.00..658.96 rows=182 width=4)                ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)                      Index Cond: (baz_id = 13266) (8 rows)  //////// Just LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) LIMIT 5 OFFSET 0;                                                               QUERY PLAN                                                                ---------------------------------------------------------------------------------------------------------------------------------------  Limit  (cost=0.00..22.21 rows=5 width=663)    ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)          ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)                Index Cond: (baz_id = 13266)          ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)                Index Cond: (foos.bar_id = bars.id)                Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (7 rows)  //////// Just ORDER SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC;                                                               QUERY PLAN                                                                ---------------------------------------------------------------------------------------------------------------------------------------  Sort  (cost=36515.17..36534.19 rows=7608 width=663)    Sort Key: foos.id    ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)          ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)                Index Cond: (baz_id = 13266)          ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)                Index Cond: (foos.bar_id = bars.id)                Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (8 rows) 
like image 563
jakeboxer Avatar asked May 17 '11 22:05

jakeboxer


People also ask

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus.

How do I find slow queries in PostgreSQL?

Typically discovered through slow response or extended increases in database CPU, the pg_stat_activity view can help to find out what query is causing issues. The pg_stat_activity view contains details of all currently running queries, including user, connection, and timing details.

Does the order of WHERE clause matter in PostgreSQL?

No, that order doesn't matter (or at least: shouldn't matter). Any decent query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query.

Does PostgreSQL optimize queries?

Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible.


2 Answers

When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.

Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal.

For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.

like image 163
Andrew Lazarus Avatar answered Oct 11 '22 11:10

Andrew Lazarus


Probably it happens because before it tries to order then to select. Why do not try to sort the result in an outer select all? Something like: SELECT * FROM (SELECT ... INNER JOIN ETC...) ORDER BY ... DESC

like image 27
Davide Ungari Avatar answered Oct 11 '22 10:10

Davide Ungari