I currently have a postgresql query that is slow because of an OR statement. It's apparently not using an index because of it. Rewriting this query failed so far. The query: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a0_.advert_category_id IN ( 1136 ) OR a1_.parent_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=1.658..50.809 rows=15 loops=1) -> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.657..50.790 rows=15 loops=1) -> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.013..16.449 rows=12405 loops=1) -> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405) Index Cond: (id = a0_.advert_category_id) Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136)) Rows Removed by Filter: 1 Total runtime: 50.860 ms </code></pre> Cause of slowness: <code>Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))</code> I've tried using INNER JOIN instead of the WHERE statement: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id AND ( a0_.advert_category_id IN ( 1136 ) OR a1_.parent_id IN ( 1136 ) ) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=4.667..139.955 rows=15 loops=1) -> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=4.666..139.932 rows=15 loops=1) -> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.019..100.765 rows=12405 loops=1) -> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405) Index Cond: (id = a0_.advert_category_id) Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136)) Rows Removed by Filter: 1 Total runtime: 140.048 ms </code></pre> The query speeds up when I remove one of the OR criteria. So I've made a UNION to see the results. It's very fast! But I do not consider this as a solution: <pre class="prettyprint"><code>EXPLAIN ANALYZE (SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a0_.advert_category_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15) UNION (SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a1_.parent_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=4125.70..4126.00 rows=30 width=12) (actual time=7.945..7.951 rows=15 loops=1) -> Append (cost=1120.82..4125.63 rows=30 width=12) (actual time=6.811..7.929 rows=15 loops=1) -> Subquery Scan on "*SELECT* 1" (cost=1120.82..1121.01 rows=15 width=12) (actual time=6.810..6.840 rows=15 loops=1) -> Limit (cost=1120.82..1120.86 rows=15 width=12) (actual time=6.809..6.825 rows=15 loops=1) -> Sort (cost=1120.82..1121.56 rows=295 width=12) (actual time=6.807..6.813 rows=15 loops=1) Sort Key: a0_.created_date Sort Method: top-N heapsort Memory: 25kB -> Nested Loop (cost=10.59..1113.59 rows=295 width=12) (actual time=1.151..6.639 rows=220 loops=1) -> Index Only Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..8.27 rows=1 width=4) (actual time=1.030..1.033 rows=1 loops=1) Index Cond: (id = 1136) Heap Fetches: 1 -> Bitmap Heap Scan on advert a0_ (cost=10.59..1102.37 rows=295 width=16) (actual time=0.099..5.287 rows=220 loops=1) Recheck Cond: (advert_category_id = 1136) -> Bitmap Index Scan on idx_54f1f40bd4436821 (cost=0.00..10.51 rows=295 width=0) (actual time=0.073..0.073 rows=220 loops=1) Index Cond: (advert_category_id = 1136) -> Subquery Scan on "*SELECT* 2" (cost=3004.43..3004.62 rows=15 width=12) (actual time=1.072..1.072 rows=0 loops=1) -> Limit (cost=3004.43..3004.47 rows=15 width=12) (actual time=1.071..1.071 rows=0 loops=1) -> Sort (cost=3004.43..3005.99 rows=626 width=12) (actual time=1.069..1.069 rows=0 loops=1) Sort Key: a0_.created_date Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=22.91..2989.07 rows=626 width=12) (actual time=1.056..1.056 rows=0 loops=1) -> Index Scan using idx_d84ab8ea727aca70 on advertcategory a1_ (cost=0.00..8.27 rows=1 width=4) (actual time=1.054..1.054 rows=0 loops=1) Index Cond: (parent_id = 1136) -> Bitmap Heap Scan on advert a0_ (cost=22.91..2972.27 rows=853 width=16) (never executed) Recheck Cond: (advert_category_id = a1_.id) -> Bitmap Index Scan on idx_54f1f40bd4436821 (cost=0.00..22.70 rows=853 width=0) (never executed) Index Cond: (advert_category_id = a1_.id) Total runtime: 8.940 ms (28 rows) </code></pre> Tried reversing the IN statement: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE 1136 IN ( a0_.advert_category_id, a1_.parent_id ) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=1.848..62.461 rows=15 loops=1) -> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.847..62.441 rows=15 loops=1) -> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.028..27.316 rows=12405 loops=1) -> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405) Index Cond: (id = a0_.advert_category_id) Filter: ((1136 = a0_.advert_category_id) OR (1136 = parent_id)) Rows Removed by Filter: 1 Total runtime: 62.506 ms (8 rows) </code></pre> Tried using EXISTS: <pre class="prettyprint"><code>EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE EXISTS(SELECT test.id FROM advert test INNER JOIN advertcategory test_cat ON test_cat.id = test.advert_category_id WHERE test.id = a0_.id AND ( test.advert_category_id IN ( 1136 ) OR test_cat.parent_id IN ( 1136 ) )) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=45538.18..45538.22 rows=15 width=12) (actual time=524.654..524.673 rows=15 loops=1) -> Sort (cost=45538.18..45540.48 rows=921 width=12) (actual time=524.651..524.658 rows=15 loops=1) Sort Key: a0_.created_date Sort Method: top-N heapsort Memory: 25kB -> Hash Join (cost=39803.59..45515.58 rows=921 width=12) (actual time=497.362..524.436 rows=220 loops=1) Hash Cond: (a0_.advert_category_id = a1_.id) -> Nested Loop (cost=39786.88..45486.21 rows=921 width=16) (actual time=496.748..523.501 rows=220 loops=1) -> HashAggregate (cost=39786.88..39796.09 rows=921 width=4) (actual time=496.705..496.872 rows=220 loops=1) -> Hash Join (cost=16.71..39784.58 rows=921 width=4) (actual time=1.210..496.294 rows=220 loops=1) Hash Cond: (test.advert_category_id = test_cat.id) Join Filter: ((test.advert_category_id = 1136) OR (test_cat.parent_id = 1136)) Rows Removed by Join Filter: 353584 -> Seq Scan on advert test (cost=0.00..33134.04 rows=353804 width=8) (actual time=0.002..177.953 rows=353804 loops=1) -> Hash (cost=9.65..9.65 rows=565 width=8) (actual time=0.622..0.622 rows=565 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 22kB -> Seq Scan on advertcategory test_cat (cost=0.00..9.65 rows=565 width=8) (actual time=0.005..0.327 rows=565 loops=1) -> Index Scan using advert_pkey on advert a0_ (cost=0.00..6.17 rows=1 width=16) (actual time=0.117..0.118 rows=1 loops=220) Index Cond: (id = test.id) -> Hash (cost=9.65..9.65 rows=565 width=4) (actual time=0.604..0.604 rows=565 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on advertcategory a1_ (cost=0.00..9.65 rows=565 width=4) (actual time=0.010..0.285 rows=565 loops=1) Total runtime: 524.797 ms </code></pre> Advert table (stripped down): <pre class="prettyprint"><code>353804 rows Table "public.advert" Column | Type | Modifiers | Storage | Stats target | Description -----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('advert_id_seq'::regclass) | plain | | advert_category_id | integer | not null | plain | | Indexes: "idx_54f1f40bd4436821" btree (advert_category_id) "advert_created_date_idx" btree (created_date) Foreign-key constraints: "fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT Has OIDs: no </code></pre> Category table (stripped down): <pre class="prettyprint"><code>565 rows Table "public.advertcategory" Column | Type | Modifiers -----------+---------+------------------------------------------------------------- id | integer | not null default nextval('advertcategory_id_seq'::regclass) parent_id | integer | active | boolean | not null system | boolean | not null Indexes: "advertcategory_pkey" PRIMARY KEY, btree (id) "idx_d84ab8ea727aca70" btree (parent_id) Foreign-key constraints: "fk_d84ab8ea727aca70" FOREIGN KEY (parent_id) REFERENCES advertcategory(id) ON DELETE RESTRICT </code></pre> Short server config: <pre class="prettyprint"><code> version -------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit name | current_setting | source ----------------------------+--------------------+---------------------- shared_buffers | 1800MB | configuration file work_mem | 4MB | configuration file </code></pre> As you can see, none of the proper solutions give speed improvement. Only the UNION solution to split the OR statement improves performance. But I can't use that because this query is being used through my ORM framework with alot of other filter options. Plus if I can do this, why doesn't the optimizer do it? It seems a really simple optimization. Any hints on this? A solution to this small problem would greatly appreciated!

Entirely new approach. Your <code>where</code> condition is on two tables, but that seems unnecessary. The first change would be: <pre class="prettyprint"><code>where a1_.id = 1136 or a1_.parent_id = 1136 </code></pre> I think the structure you want is a scan on the category table and then fetches from the advert table. To help, you can create an index on <code>advert(advert_category_id, created_date)</code>. I would be tempted to write the query by moving the <code>where</code> clause into a subquery. I don't know if this would effect performance: <pre class="prettyprint"><code>SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN (select ac.* from advertcategory ac where ac.id = 1136 or ac.parent_id = 1136 ) ac ON a0_.advert_category_id = ac.id ORDER BY a0_.created_date DESC LIMIT 15; </code></pre>

Slow OR statement in postgresql

Tags:

sql

postgresql

I currently have a postgresql query that is slow because of an OR statement. It's apparently not using an index because of it. Rewriting this query failed so far.

The query:

EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  a0_.advert_category_id IN ( 1136 )
        OR a1_.parent_id IN ( 1136 )
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                           QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..27542.49 rows=15 width=12) (actual time=1.658..50.809 rows=15 loops=1)
   ->  Nested Loop  (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.657..50.790 rows=15 loops=1)
         ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.013..16.449 rows=12405 loops=1)
         ->  Index Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
               Index Cond: (id = a0_.advert_category_id)
               Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
               Rows Removed by Filter: 1
 Total runtime: 50.860 ms

Cause of slowness: Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))

I've tried using INNER JOIN instead of the WHERE statement:

EXPLAIN ANALYZE  SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
                  AND ( a0_.advert_category_id IN ( 1136 )
                         OR a1_.parent_id IN ( 1136 ) )
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..27542.49 rows=15 width=12) (actual time=4.667..139.955 rows=15 loops=1)
   ->  Nested Loop  (cost=0.00..1691109.07 rows=921 width=12) (actual time=4.666..139.932 rows=15 loops=1)
         ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.019..100.765 rows=12405 loops=1)
         ->  Index Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
               Index Cond: (id = a0_.advert_category_id)
               Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
               Rows Removed by Filter: 1
 Total runtime: 140.048 ms

The query speeds up when I remove one of the OR criteria. So I've made a UNION to see the results. It's very fast! But I do not consider this as a solution:

EXPLAIN ANALYZE 
 (SELECT a0_.id AS id0
 FROM   advert a0_
        INNER JOIN advertcategory a1_
                ON a0_.advert_category_id = a1_.id
 WHERE  a0_.advert_category_id IN ( 1136 )
 ORDER  BY a0_.created_date DESC
 LIMIT  15)
UNION
(SELECT a0_.id AS id0
 FROM   advert a0_
        INNER JOIN advertcategory a1_
                ON a0_.advert_category_id = a1_.id
 WHERE  a1_.parent_id IN ( 1136 )
 ORDER  BY a0_.created_date DESC
 LIMIT  15);  

                                                                               QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=4125.70..4126.00 rows=30 width=12) (actual time=7.945..7.951 rows=15 loops=1)
   ->  Append  (cost=1120.82..4125.63 rows=30 width=12) (actual time=6.811..7.929 rows=15 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=1120.82..1121.01 rows=15 width=12) (actual time=6.810..6.840 rows=15 loops=1)
               ->  Limit  (cost=1120.82..1120.86 rows=15 width=12) (actual time=6.809..6.825 rows=15 loops=1)
                     ->  Sort  (cost=1120.82..1121.56 rows=295 width=12) (actual time=6.807..6.813 rows=15 loops=1)
                           Sort Key: a0_.created_date
                           Sort Method: top-N heapsort  Memory: 25kB
                           ->  Nested Loop  (cost=10.59..1113.59 rows=295 width=12) (actual time=1.151..6.639 rows=220 loops=1)
                                 ->  Index Only Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..8.27 rows=1 width=4) (actual time=1.030..1.033 rows=1 loops=1)
                                       Index Cond: (id = 1136)
                                       Heap Fetches: 1
                                 ->  Bitmap Heap Scan on advert a0_  (cost=10.59..1102.37 rows=295 width=16) (actual time=0.099..5.287 rows=220 loops=1)
                                       Recheck Cond: (advert_category_id = 1136)
                                       ->  Bitmap Index Scan on idx_54f1f40bd4436821  (cost=0.00..10.51 rows=295 width=0) (actual time=0.073..0.073 rows=220 loops=1)
                                             Index Cond: (advert_category_id = 1136)
         ->  Subquery Scan on "*SELECT* 2"  (cost=3004.43..3004.62 rows=15 width=12) (actual time=1.072..1.072 rows=0 loops=1)
               ->  Limit  (cost=3004.43..3004.47 rows=15 width=12) (actual time=1.071..1.071 rows=0 loops=1)
                     ->  Sort  (cost=3004.43..3005.99 rows=626 width=12) (actual time=1.069..1.069 rows=0 loops=1)
                           Sort Key: a0_.created_date
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=22.91..2989.07 rows=626 width=12) (actual time=1.056..1.056 rows=0 loops=1)
                                 ->  Index Scan using idx_d84ab8ea727aca70 on advertcategory a1_  (cost=0.00..8.27 rows=1 width=4) (actual time=1.054..1.054 rows=0 loops=1)
                                       Index Cond: (parent_id = 1136)
                                 ->  Bitmap Heap Scan on advert a0_  (cost=22.91..2972.27 rows=853 width=16) (never executed)
                                       Recheck Cond: (advert_category_id = a1_.id)
                                       ->  Bitmap Index Scan on idx_54f1f40bd4436821  (cost=0.00..22.70 rows=853 width=0) (never executed)
                                             Index Cond: (advert_category_id = a1_.id)
 Total runtime: 8.940 ms
(28 rows)

Tried reversing the IN statement:

EXPLAIN ANALYZE  SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  1136 IN ( a0_.advert_category_id, a1_.parent_id )
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..27542.49 rows=15 width=12) (actual time=1.848..62.461 rows=15 loops=1)
   ->  Nested Loop  (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.847..62.441 rows=15 loops=1)
         ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.028..27.316 rows=12405 loops=1)
         ->  Index Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
               Index Cond: (id = a0_.advert_category_id)
               Filter: ((1136 = a0_.advert_category_id) OR (1136 = parent_id))
               Rows Removed by Filter: 1
 Total runtime: 62.506 ms
(8 rows)

Tried using EXISTS:

EXPLAIN ANALYZE  SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  EXISTS(SELECT test.id
              FROM   advert test
                     INNER JOIN advertcategory test_cat
                             ON test_cat.id = test.advert_category_id
              WHERE  test.id = a0_.id
                     AND ( test.advert_category_id IN ( 1136 )
                            OR test_cat.parent_id IN ( 1136 ) ))
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=45538.18..45538.22 rows=15 width=12) (actual time=524.654..524.673 rows=15 loops=1)
   ->  Sort  (cost=45538.18..45540.48 rows=921 width=12) (actual time=524.651..524.658 rows=15 loops=1)
         Sort Key: a0_.created_date
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Hash Join  (cost=39803.59..45515.58 rows=921 width=12) (actual time=497.362..524.436 rows=220 loops=1)
               Hash Cond: (a0_.advert_category_id = a1_.id)
               ->  Nested Loop  (cost=39786.88..45486.21 rows=921 width=16) (actual time=496.748..523.501 rows=220 loops=1)
                     ->  HashAggregate  (cost=39786.88..39796.09 rows=921 width=4) (actual time=496.705..496.872 rows=220 loops=1)
                           ->  Hash Join  (cost=16.71..39784.58 rows=921 width=4) (actual time=1.210..496.294 rows=220 loops=1)
                                 Hash Cond: (test.advert_category_id = test_cat.id)
                                 Join Filter: ((test.advert_category_id = 1136) OR (test_cat.parent_id = 1136))
                                 Rows Removed by Join Filter: 353584
                                 ->  Seq Scan on advert test  (cost=0.00..33134.04 rows=353804 width=8) (actual time=0.002..177.953 rows=353804 loops=1)
                                 ->  Hash  (cost=9.65..9.65 rows=565 width=8) (actual time=0.622..0.622 rows=565 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 22kB
                                       ->  Seq Scan on advertcategory test_cat  (cost=0.00..9.65 rows=565 width=8) (actual time=0.005..0.327 rows=565 loops=1)
                     ->  Index Scan using advert_pkey on advert a0_  (cost=0.00..6.17 rows=1 width=16) (actual time=0.117..0.118 rows=1 loops=220)
                           Index Cond: (id = test.id)
               ->  Hash  (cost=9.65..9.65 rows=565 width=4) (actual time=0.604..0.604 rows=565 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 20kB
                     ->  Seq Scan on advertcategory a1_  (cost=0.00..9.65 rows=565 width=4) (actual time=0.010..0.285 rows=565 loops=1)
 Total runtime: 524.797 ms

Advert table (stripped down):

353804 rows
                                                                   Table "public.advert"
           Column            |              Type              |                      Modifiers                      | Storage  | Stats target | Description 
-----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
 id                          | integer                        | not null default nextval('advert_id_seq'::regclass) | plain    |              | 
 advert_category_id          | integer                        | not null                                            | plain    |              | 
Indexes:
    "idx_54f1f40bd4436821" btree (advert_category_id)
    "advert_created_date_idx" btree (created_date)
Foreign-key constraints:
    "fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
Has OIDs: no

Category table (stripped down):

565 rows

                           Table "public.advertcategory"
  Column   |  Type   |                          Modifiers                          
-----------+---------+-------------------------------------------------------------
 id        | integer | not null default nextval('advertcategory_id_seq'::regclass)
 parent_id | integer | 
 active    | boolean | not null
 system    | boolean | not null
Indexes:
    "advertcategory_pkey" PRIMARY KEY, btree (id)
    "idx_d84ab8ea727aca70" btree (parent_id)
Foreign-key constraints:
    "fk_d84ab8ea727aca70" FOREIGN KEY (parent_id) REFERENCES advertcategory(id) ON DELETE RESTRICT

Short server config:

                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

            name            |  current_setting   |        source        
----------------------------+--------------------+----------------------
 shared_buffers             | 1800MB             | configuration file
 work_mem                   | 4MB                | configuration file

As you can see, none of the proper solutions give speed improvement. Only the UNION solution to split the OR statement improves performance. But I can't use that because this query is being used through my ORM framework with alot of other filter options. Plus if I can do this, why doesn't the optimizer do it? It seems a really simple optimization.

Any hints on this? A solution to this small problem would greatly appreciated!

290

asked May 29 '13 13:05

mauserrifle

2 Answers

Entirely new approach. Your where condition is on two tables, but that seems unnecessary.

The first change would be:

where a1_.id = 1136 or a1_.parent_id = 1136

I think the structure you want is a scan on the category table and then fetches from the advert table. To help, you can create an index on advert(advert_category_id, created_date).

I would be tempted to write the query by moving the where clause into a subquery. I don't know if this would effect performance:

SELECT a0_.id AS id0
FROM   advert a0_ INNER JOIN
       (select ac.*
        from advertcategory ac
        where ac.id = 1136 or ac.parent_id = 1136
       ) ac
       ON a0_.advert_category_id = ac.id
ORDER  BY a0_.created_date DESC
LIMIT  15;

163

answered Oct 12 '22 23:10

Gordon Linoff

"Plus if I can do this, why doesn't the optimizer do it?" -- Because there are all sorts of cases where it's not necessarily valid (due to aggregates in subqueries) or interesting (due to a better index) to do so.

The best query plan you'll possibly get is given in Gordon's answer, using union all instead of union to avoid the sort (I take it that a category is never its own parent, eliminating any possibility of dups).

Otherwise, note that your query could be rewritten like so:

SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  a1_.id IN ( 1136 )
        OR a1_.parent_id IN ( 1136 )
ORDER  BY a0_.created_date DESC
LIMIT  15;

In other words you're filtering based on a criteria from one table, and sort/limit based on another. The way you wrote it precludes you from being able to use a good index, because the planner isn't recognizing that the filter criteria is all from the same table, so it'll nestloop over created_date with a filter like you're currently doing. It's not a bad plan, mind you... It's actually the right thing to do if e.g. 1136 isn't very selective a criteria.

By making it explicit that the second table is the one of interest, you may end up with a bitmap heap scan when the category is selective enough, if you've indexes on advertcategory (id) (which you already have if it's the primary key) and on advertcategory (parent_id) (which you probably do not have at the moment). Don't count too much on it, though -- PG doesn't collect correlated column information insofar as I'm aware.

Another possibility might be to maintain an array with the aggregate categories (using triggers) in advert directly, and to use a GIST index on it:

SELECT a0_.id AS id0
FROM   advert a0_
WHERE  ARRAY[1136, 1137] && a0_.category_ids -- 1136 OR 1137; use <@ for AND
ORDER  BY a0_.created_date DESC
LIMIT  15;

It's technically redundant, but it works well to optimize this sort of query (i.e. filters over a nested category tree that yield complicated join criterias)... When PG decides to use it, you'll end up top-n sorting applicable ads. (In older PG versions, the selectivity of && was arbitrary from lack of stats; I vaguely remember reading a changelog whereby 9.1, 9.2 or 9.3 improved things, presumably by using code similar to that used by the tsvector content statistics collector for generic array types. At any rate, be sure to use the latest PG version, and be sure to not rewrite that query using operators that the gin/gist index won't be able to use.)

answered Oct 12 '22 23:10

Denis de Bernardy

Related questions
                            
                                Sql Server 2008 - Dropping a synonym
                            
                                Comment threading with score factoring
                            
                                How to create a histogram using MySQL
                            
                                return a default record from a sql query
                            
                                How do I print the queries executed by django .save() method?
                            
                                SQL Anywhere 11 - table size
                            
                                Stripe the order of a PostgreSQL result set
                            
                                How can I find out what SQL Server tables are linked to MS Access?
                            
                                MySQL server at 'reading initial communication packet', system error: 111 [closed]
                            
                                .NET 4.5 Universal Membership Provider table names
                            
                                Dynamically creating the IN clause in a stored procedure [duplicate]
                            
                                Can I use DBD::mysql quote() function without a database connection? [duplicate]
                            
                                SQL Query Time Complexity
                            
                                Split string by comma in SQL Server 2008
                            
                                JOIN versus EXISTS performance
                            
                                How to stop execution of sql scripts when an error is encountered
                            
                                Setting up a Database Diagram in VS2012 & SSDT?
                            
                                using if statement how to replace values
                            
                                Postgres partition by week
                            
                                SQL inner join vs subquery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With