I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query: <pre class="prettyprint"><code>SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; </code></pre> This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the <code>ORDER BY</code> clause or the <code>LIMIT</code> clause, it runs almost instantaneously. I'm assuming that the presence of both <code>ORDER BY</code> and <code>LIMIT</code> are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this? In case it helps, here is the <code>EXPLAIN</code> for all 3 cases: <pre class="prettyprint"><code>//////// Both ORDER and LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..16663.44 rows=5 width=663) -> Nested Loop (cost=0.00..25355084.05 rows=7608 width=663) Join Filter: (foos.bar_id = bars.id) -> Index Scan Backward using foos_pkey on foos (cost=0.00..11804133.33 rows=4963477 width=663) Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text)) -> Materialize (cost=0.00..658.96 rows=182 width=4) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) (8 rows) //////// Just LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) LIMIT 5 OFFSET 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..22.21 rows=5 width=663) -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (7 rows) //////// Just ORDER SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=36515.17..36534.19 rows=7608 width=663) Sort Key: foos.id -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (8 rows) </code></pre>

When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records. Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal. For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.

Extremely slow PostgreSQL query with ORDER and LIMIT clauses

Tags:

sql

postgresql

sql-order-by

limit

query-optimization

I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:

SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0;

This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the ORDER BY clause or the LIMIT clause, it runs almost instantaneously.

I'm assuming that the presence of both ORDER BY and LIMIT are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?

In case it helps, here is the EXPLAIN for all 3 cases:

//////// Both ORDER and LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0;                                                      QUERY PLAN                                                      --------------------------------------------------------------------------------------------------------------------  Limit  (cost=0.00..16663.44 rows=5 width=663)    ->  Nested Loop  (cost=0.00..25355084.05 rows=7608 width=663)          Join Filter: (foos.bar_id = bars.id)          ->  Index Scan Backward using foos_pkey on foos  (cost=0.00..11804133.33 rows=4963477 width=663)                Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))          ->  Materialize  (cost=0.00..658.96 rows=182 width=4)                ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)                      Index Cond: (baz_id = 13266) (8 rows)  //////// Just LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) LIMIT 5 OFFSET 0;                                                               QUERY PLAN                                                                ---------------------------------------------------------------------------------------------------------------------------------------  Limit  (cost=0.00..22.21 rows=5 width=663)    ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)          ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)                Index Cond: (baz_id = 13266)          ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)                Index Cond: (foos.bar_id = bars.id)                Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (7 rows)  //////// Just ORDER SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC;                                                               QUERY PLAN                                                                ---------------------------------------------------------------------------------------------------------------------------------------  Sort  (cost=36515.17..36534.19 rows=7608 width=663)    Sort Key: foos.id    ->  Nested Loop  (cost=0.00..33788.21 rows=7608 width=663)          ->  Index Scan using index_bars_on_baz_id on bars  (cost=0.00..658.05 rows=182 width=4)                Index Cond: (baz_id = 13266)          ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.00..181.51 rows=42 width=663)                Index Cond: (foos.bar_id = bars.id)                Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (8 rows)

563

asked May 17 '11 22:05

jakeboxer

2 Answers

When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.

Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal.

For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.

163

answered Oct 11 '22 11:10

Andrew Lazarus

Probably it happens because before it tries to order then to select. Why do not try to sort the result in an outer select all? Something like: SELECT * FROM (SELECT ... INNER JOIN ETC...) ORDER BY ... DESC

answered Oct 11 '22 10:10

Davide Ungari

Related questions
                            
                                Why OBJECT_ID used while checking if a table exists or not
                            
                                How do I list user defined types in a SQL Server database?
                            
                                Get List of Computed Columns in Database Table (SQL Server)
                            
                                Error java.sql.SQLException: ORA-00911: invalid character [duplicate]
                            
                                SQL - Combining multiple like queries
                            
                                How to change column varchar to clob in oracle
                            
                                Resetting the time part of a timestamp in Java
                            
                                Handling identity columns in an "Insert Into TABLE Values()" statement?
                            
                                How to convert DataSet to DataTable
                            
                                What do the mysql workbench column icons mean
                            
                                How to change schema of all tables, views and stored procedures in MSSQL
                            
                                How to count rows that have the same values in two columns (SQL)?
                            
                                How do I retrieve decimals when rounding an average in SQL
                            
                                Getting warning: Null value is eliminated by an aggregate or other SET operation
                            
                                When is it better to write "ad hoc sql" vs stored procedures [duplicate]
                            
                                MySQL create stored procedure syntax with delimiter
                            
                                comparison of truncate vs delete in mysql/sqlserver [duplicate]
                            
                                How do I use a complex criteria inside a doctrine 2 entity's repository?
                            
                                How to have SQL INNER JOIN accept null results
                            
                                Does JPA support mapping to sql views?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With