Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL Select statement very slow due to small JOIN/WHERE filter

2013-05-29: Updated question with latest configuration and extra info. Earlier I was testing in a virtualbox image. Now I am testing on the productive server, which reflects the real world much better. Question should be fully clear now. If you have helped me before, read again carefully

Currently I have found a query that is very slow in PostgreSQL, though I do not understand how it can be slow. I scaled it down a bit so it's much smaller to post here (and much faster, but still slow!).

Little background: In this project I have adverts that belong to users. Users are part of an area within the country. An area can have multiple child areas, so the area table is a tree. A network is assigned to an area. When filtering on a network it should filter on that area and all its area childs in the tree. Because I can't query against a endless tree I have table that flattens this full tree.

So with 1 query (SELECT area_id FROM network_area_flatdeep WHERE network_id = 1) I get all areas that belong to network 1: 63, 64, 65, 66, 67, 68, 69, 70

This makes querying very easy.

The slow query (Before testing everything has been VACUUM ANALYZED):

EXPLAIN ANALYZE SELECT a0_.id  AS id0
FROM   advert a0_
       INNER JOIN member m6_
               ON a0_.user_id = m6_.id
       INNER JOIN area a7_
               ON m6_.area_id = a7_.id
WHERE  a0_.status IN ( 1 )
       AND m6_.status IN ( 1 )
       AND a7_.id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
ORDER  BY a0_.created_date DESC
LIMIT  60;
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=41.10..34695.53 rows=60 width=12) (actual time=9.327..134.581 rows=60 loops=1)
   ->  Nested Loop  (cost=41.10..2291276.69 rows=3967 width=12) (actual time=9.326..134.534 rows=60 loops=1)
         Join Filter: (a7_.id = m6_.area_id)
         Rows Removed by Join Filter: 22566
         ->  Nested Loop  (cost=41.10..821630.16 rows=317633 width=24) (actual time=0.049..39.638 rows=22666 loops=1)
               ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..762000.64 rows=317633 width=16) (actual time=0.013..4.357 rows=2834 loops=1)
                     Filter: (status = 1)
                     Rows Removed by Filter: 21
               ->  Materialize  (cost=41.10..73.38 rows=15 width=8) (actual time=0.000..0.004 rows=8 loops=2834)
                     ->  Nested Loop  (cost=41.10..73.30 rows=15 width=8) (actual time=0.031..0.073 rows=8 loops=1)
                           ->  HashAggregate  (cost=41.10..41.18 rows=8 width=4) (actual time=0.023..0.026 rows=8 loops=1)
                                 ->  Bitmap Heap Scan on network_area_flatdeep  (cost=4.37..41.06 rows=15 width=4) (actual time=0.011..0.015 rows=8 loops=1)
                                       Recheck Cond: (network_id = 1)
                                       ->  Bitmap Index Scan on idx_c29e880034128b91  (cost=0.00..4.36 rows=15 width=0) (actual time=0.007..0.007 rows=8 loops=1)
                                             Index Cond: (network_id = 1)
                           ->  Index Only Scan using area_pkey on area a7_  (cost=0.00..4.01 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=8)
                                 Index Cond: (id = network_area_flatdeep.area_id)
                                 Heap Fetches: 8
         ->  Index Scan using member_pkey on member m6_  (cost=0.00..4.61 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=22666)
               Index Cond: (id = a0_.user_id)
               Filter: (status = 1)
               Rows Removed by Filter: 0
 Total runtime: 134.698 ms
(23 rows)

The subquery itself is:

EXPLAIN ANALYZE SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1);
                                                        QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on network_area_flatdeep  (cost=4.37..41.06 rows=15 width=4) (actual time=0.020..0.024 rows=8 loops=1)
   Recheck Cond: (network_id = 1)
   ->  Bitmap Index Scan on idx_c29e880034128b91  (cost=0.00..4.36 rows=15 width=0) (actual time=0.012..0.012 rows=8 loops=1)
         Index Cond: (network_id = 1)
 Total runtime: 0.051 ms
(5 rows)

which results in : 63, 64, 65, 66, 67, 68, 69, 70

So I've tried to hard code in the ids. And expected it to be faster (but is not):

EXPLAIN ANALYZE SELECT a0_.id  AS id0
FROM   advert a0_
       INNER JOIN member m6_
               ON a0_.user_id = m6_.id
       INNER JOIN area a7_
                   ON m6_.area_id = a7_.id
WHERE  a0_.status IN ( 1 )
       AND m6_.status IN ( 1 )
       AND a7_.id IN (63, 64, 65, 66, 67, 68, 69, 70)
ORDER  BY a0_.created_date DESC
LIMIT  60;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17558.82..17558.97 rows=60 width=12) (actual time=56.594..56.670 rows=60 loops=1)
   ->  Sort  (cost=17558.82..17560.07 rows=498 width=12) (actual time=56.593..56.621 rows=60 loops=1)
         Sort Key: a0_.created_date
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Nested Loop  (cost=0.00..17541.62 rows=498 width=12) (actual time=0.047..53.808 rows=4478 loops=1)
               ->  Nested Loop  (cost=0.00..3903.99 rows=286 width=4) (actual time=0.027..17.492 rows=8004 loops=1)
                     ->  Seq Scan on area a7_  (cost=0.00..144.78 rows=8 width=4) (actual time=0.007..0.823 rows=8 loops=1)
                           Filter: (id = ANY ('{63,64,65,66,67,68,69,70}'::integer[]))
                           Rows Removed by Filter: 5081
                     ->  Index Scan using idx_70e4fa78bd0f409c on member m6_  (cost=0.00..468.38 rows=152 width=8) (actual time=0.011..1.208 rows=1000 loops=8)
                           Index Cond: (area_id = a7_.id)
                           Filter: (status = 1)
                           Rows Removed by Filter: 2
               ->  Index Scan using idx_54f1f40ba76ed395 on advert a0_  (cost=0.00..47.49 rows=19 width=16) (actual time=0.002..0.003 rows=1 loops=8004)
                     Index Cond: (user_id = m6_.id)
                     Filter: (status = 1)
                     Rows Removed by Filter: 1
 Total runtime: 56.744 ms
(18 rows)

Time: 57.995 ms

I tried to put the subquery in the INNER JOIN:

EXPLAIN ANALYZE SELECT a0_.id  AS id0
FROM   advert a0_
       INNER JOIN member m6_
               ON a0_.user_id = m6_.id
       INNER JOIN area a7_
               ON m6_.area_id = a7_.id 
               AND m6_.area_id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN     (1))
WHERE  a0_.status IN ( 1 )
       AND m6_.status IN ( 1 )
ORDER  BY a0_.created_date DESC
LIMIT  60;
                                                                                QUERY PLAN                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4.37..34966.73 rows=60 width=12) (actual time=2.957..42.443 rows=60 loops=1)
   ->  Nested Loop  (cost=4.37..2311599.10 rows=3967 width=12) (actual time=2.956..42.394 rows=60 loops=1)
         ->  Nested Loop Semi Join  (cost=4.37..2302173.51 rows=3967 width=20) (actual time=2.949..42.099 rows=60 loops=1)
               Join Filter: (m6_.area_id = network_area_flatdeep.area_id)
               Rows Removed by Join Filter: 22333
               ->  Nested Loop  (cost=0.00..2230853.09 rows=316797 width=16) (actual time=0.028..18.612 rows=2829 loops=1)
                     ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..762000.64 rows=317633 width=16) (actual time=0.012..3.802 rows=2834 loops=1)
                           Filter: (status = 1)
                           Rows Removed by Filter: 21
                     ->  Index Scan using member_pkey on member m6_  (cost=0.00..4.61 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=2834)
                           Index Cond: (id = a0_.user_id)
                           Filter: (status = 1)
                           Rows Removed by Filter: 0
               ->  Materialize  (cost=4.37..41.14 rows=15 width=4) (actual time=0.000..0.004 rows=8 loops=2829)
                     ->  Bitmap Heap Scan on network_area_flatdeep  (cost=4.37..41.06 rows=15 width=4) (actual time=0.009..0.015 rows=8 loops=1)
                           Recheck Cond: (network_id = 1)
                           ->  Bitmap Index Scan on idx_c29e880034128b91  (cost=0.00..4.36 rows=15 width=0) (actual time=0.006..0.006 rows=8 loops=1)
                                 Index Cond: (network_id = 1)
         ->  Index Only Scan using area_pkey on area a7_  (cost=0.00..2.37 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=60)
               Index Cond: (id = m6_.area_id)
               Heap Fetches: 60
 Total runtime: 42.538 ms
(22 rows)

I tried to get rid of the subquery and make a proper JOIN statement to network_area_flatdeep (I STRONGLY PREFER THIS VERSION):

EXPLAIN ANALYZE SELECT a0_.id  AS id0
FROM   advert a0_
       INNER JOIN member m6_
               ON a0_.user_id = m6_.id
       INNER JOIN area a7_
               ON m6_.area_id = a7_.id 
       INNER JOIN network_area_flatdeep n14_
               ON a7_.id = n14_.area_id
                  AND ( n14_.network_id IN ( 1 ) )
WHERE  a0_.status IN ( 1 )
       AND m6_.status IN ( 1 )
ORDER  BY a0_.created_date DESC
LIMIT  60;
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=30031.18..30031.33 rows=60 width=12) (actual time=62.968..63.045 rows=60 loops=1)
   ->  Sort  (cost=30031.18..30033.51 rows=934 width=12) (actual time=62.967..62.991 rows=60 loops=1)
         Sort Key: a0_.created_date
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Nested Loop  (cost=0.00..29998.92 rows=934 width=12) (actual time=0.157..60.280 rows=4478 loops=1)
               ->  Nested Loop  (cost=0.00..4401.66 rows=536 width=4) (actual time=0.029..20.488 rows=8004 loops=1)
                     ->  Nested Loop  (cost=0.00..120.69 rows=15 width=8) (actual time=0.015..0.084 rows=8 loops=1)
                           ->  Index Scan using idx_c29e880034128b91 on network_area_flatdeep n14_  (cost=0.00..60.47 rows=15 width=4) (actual time=0.009..0.019 rows=8 loops=1)
                                 Index Cond: (network_id = 1)
                           ->  Index Only Scan using area_pkey on area a7_  (cost=0.00..4.01 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=8)
                                 Index Cond: (id = n14_.area_id)
                                 Heap Fetches: 8
                     ->  Index Scan using idx_70e4fa78bd0f409c on member m6_  (cost=0.00..283.88 rows=152 width=8) (actual time=0.011..1.278 rows=1000 loops=8)
                           Index Cond: (area_id = a7_.id)
                           Filter: (status = 1)
                           Rows Removed by Filter: 2
               ->  Index Scan using idx_54f1f40ba76ed395 on advert a0_  (cost=0.00..47.57 rows=19 width=16) (actual time=0.003..0.003 rows=1 loops=8004)
                     Index Cond: (user_id = m6_.id)
                     Filter: (status = 1)
                     Rows Removed by Filter: 1
 Total runtime: 63.125 ms
(21 rows)

Changing IN to = doesn't helper either.

I tried using EXISTS as suggested by wilderplasser with an answer below.

When I either remove the ORDER BY or the network criteria, the query is fast (2ms). But why can't they play nice together?

Now then... when I change the network_id from '1' to '10'. The query is extremely fast like I want. 10 is a root network that contains ALL areas. So apperently the LESS results the JOIN needs to filter out, the faster the query is.

The structure looks like this:

area (mapped to network 10 which contains 5089 areas in flat table and is FAST)
|    |---- area 1 (network 1 which contains 8 areas in flat table and is SLOW)
|        |--- more areas
|-- ALOT MORE areas
|-- etc

Changing network 1 to 10 gives: 3.265 ms

To summarize the difference:

network 1 has 8 areas in network_area_flatdeep network 10 has 5089 areas in network_area_flatdeep

So INNER JOIN using network 1 is very slow, INNER JOIN using network 10 is very fast. Same behavior with subqueries.

Advert table

353804 rows
                                                                   Table "public.advert"
           Column            |              Type              |                      Modifiers                      | Storage  | Stats target | Description 
-----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
 id                          | integer                        | not null default nextval('advert_id_seq'::regclass) | plain    |              | 
 user_id                     | integer                        | not null                                            | plain    |              | 
 advert_category_id          | integer                        | not null                                            | plain    |              | 
 currency_id                 | integer                        | not null                                            | plain    |              | 
 advert_kind_id              | integer                        | not null                                            | plain    |              | 
 advert_price_id             | integer                        |                                                     | plain    |              | 
 external_source_id          | integer                        |                                                     | plain    |              | 
 status                      | integer                        | not null                                            | plain    |              | 
 type                        | integer                        | not null                                            | plain    |              | 
 title                       | character varying(60)          | not null                                            | extended |              | 
 description                 | text                           | not null                                            | extended |              | 
 price                       | numeric(19,2)                  | default NULL::numeric                               | main     |              | 
 accepting_bids              | boolean                        | not null                                            | plain    |              | 
 promoted                    | boolean                        | not null                                            | plain    |              | 
 edited_date                 | timestamp(0) without time zone | default NULL::timestamp without time zone           | plain    |              | 
 created_date                | timestamp(0) without time zone | not null                                            | plain    |              | 
 archived_date               | timestamp(0) without time zone | default NULL::timestamp without time zone           | plain    |              | 
 views                       | integer                        | not null                                            | plain    |              | 
 checked_date                | timestamp(0) without time zone | default NULL::timestamp without time zone           | plain    |              | 
 archived_by_cron            | boolean                        | not null                                            | plain    |              | 
 unarchived_by_cron          | boolean                        | not null                                            | plain    |              | 
 containting_forbidden_words | boolean                        | not null                                            | plain    |              | 
 external_id                 | character varying(255)         | default NULL::character varying                     | extended |              | 
 new_product                 | boolean                        | not null                                            | plain    |              | 
Indexes:
    "advert_pkey" PRIMARY KEY, btree (id)
    "advert_external_idx_uq" UNIQUE, btree (external_id, external_source_id)
    "advert_archived_date_idx" btree (archived_date)
    "advert_checked_date_idx" btree (checked_date)
    "advert_created_date_idx" btree (created_date)
    "advert_edited_date_idx" btree (edited_date)
    "advert_external_id_idx" btree (external_id)
    "advert_price_idx" btree (price)
    "advert_status_idx" btree (status)
    "advert_type_idx" btree (type)
    "advert_views_idx" btree (views)
    "idx_54f1f40b38248176" btree (currency_id)
    "idx_54f1f40b54b67d66" btree (advert_price_id)
    "idx_54f1f40b9a2e6cff" btree (advert_kind_id)
    "idx_54f1f40ba76ed395" btree (user_id)
    "idx_54f1f40bb167b375" btree (external_source_id)
    "idx_54f1f40bd4436821" btree (advert_category_id)
Foreign-key constraints:
    "fk_54f1f40b38248176" FOREIGN KEY (currency_id) REFERENCES currency(id) ON DELETE RESTRICT
    "fk_54f1f40b54b67d66" FOREIGN KEY (advert_price_id) REFERENCES advertprice(id) ON DELETE RESTRICT
    "fk_54f1f40b9a2e6cff" FOREIGN KEY (advert_kind_id) REFERENCES advertkind(id) ON DELETE RESTRICT
    "fk_54f1f40ba76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
    "fk_54f1f40bb167b375" FOREIGN KEY (external_source_id) REFERENCES externalsource(id) ON DELETE RESTRICT
    "fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
Referenced by:
    TABLE "advert_photo" CONSTRAINT "fk_1c939974d07eccb6" FOREIGN KEY (advert_id) REFERENCES advert(id) ON DELETE CASCADE
    TABLE "banner" CONSTRAINT "fk_6f9db8e7d07eccb6" FOREIGN KEY (advert_id) REFERENCES advert(id) ON DELETE SET NULL
    TABLE "advertbid" CONSTRAINT "fk_fccdba75d07eccb6" FOREIGN KEY (advert_id) REFERENCES advert(id) ON DELETE CASCADE
Has OIDs: no

Area table:

5089 rows
                                               Table "public.area"
   Column   |  Type   |                     Modifiers                     | Storage | Stats target | Description 
------------+---------+---------------------------------------------------+---------+--------------+-------------
 id         | integer | not null default nextval('area_id_seq'::regclass) | plain   |              | 
 network_id | integer |                                                   | plain   |              | 
 parent_id  | integer |                                                   | plain   |              | 
 selectable | boolean | not null                                          | plain   |              | 
Indexes:
    "area_pkey" PRIMARY KEY, btree (id)
    "idx_d7943d6834128b91" btree (network_id)
    "idx_d7943d68727aca70" btree (parent_id)
Foreign-key constraints:
    "fk_d7943d6834128b91" FOREIGN KEY (network_id) REFERENCES network(id) ON DELETE RESTRICT
    "fk_d7943d68727aca70" FOREIGN KEY (parent_id) REFERENCES area(id) ON DELETE CASCADE
Referenced by:
    TABLE "network_area_flat" CONSTRAINT "fk_10aae5b2bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
    TABLE "area_language" CONSTRAINT "fk_17d42f7dbd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
    TABLE "area_zip_code" CONSTRAINT "fk_62a3bf90bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
    TABLE "member" CONSTRAINT "fk_70e4fa78bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE RESTRICT
    TABLE "network_area_flatdeep" CONSTRAINT "fk_c29e8800bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
    TABLE "area" CONSTRAINT "fk_d7943d68727aca70" FOREIGN KEY (parent_id) REFERENCES area(id) ON DELETE CASCADE
Has OIDs: no

Member table:

182450 rows
                                                               Table "public.member"
        Column         |              Type              |                      Modifiers                      | Storage  | Stats target | Description 
-----------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
 id                    | integer                        | not null default nextval('member_id_seq'::regclass) | plain    |              | 
 language_id           | integer                        | not null                                            | plain    |              | 
 area_id               | integer                        | not null                                            | plain    |              | 
 company_id            | integer                        |                                                     | plain    |              | 
 external_source_id    | integer                        |                                                     | plain    |              | 
 email                 | character varying(255)         | not null                                            | extended |              | 
 password              | character varying(40)          | not null                                            | extended |              | 
 status                | integer                        | not null                                            | plain    |              | 
 name                  | character varying(150)         | not null                                            | extended |              | 
 zip_code              | character varying(20)          |                                                     | extended |              | 
 phone_number          | character varying(120)         | default NULL::character varying                     | extended |              | 
 using_email_service   | boolean                        | not null                                            | plain    |              | 
 edited_date           | timestamp(0) without time zone | default NULL::timestamp without time zone           | plain    |              | 
 created_date          | timestamp(0) without time zone | not null                                            | plain    |              | 
 hiding_on_own_network | boolean                        | not null                                            | plain    |              | 
 staff                 | boolean                        | not null                                            | plain    |              | 
 superuser             | boolean                        | not null                                            | plain    |              | 
 external_id           | character varying(255)         | default NULL::character varying                     | extended |              | 
 last_login_date       | timestamp(0) without time zone | default NULL::timestamp without time zone           | plain    |              | 
 deleted_adverts       | integer                        | not null                                            | plain    |              | 
Indexes:
    "member_pkey" PRIMARY KEY, btree (id)
    "user_email_idx_uq" UNIQUE, btree (email)
    "user_external_idx_uq" UNIQUE, btree (external_id, external_source_id)
    "idx_70e4fa7882f1baf4" btree (language_id)
    "idx_70e4fa78979b1ad6" btree (company_id)
    "idx_70e4fa78b167b375" btree (external_source_id)
    "idx_70e4fa78bd0f409c" btree (area_id)
    "user_external_id_idx" btree (external_id)
    "user_name_idx" btree (name)
    "user_status_idx" btree (status)
Foreign-key constraints:
    "fk_70e4fa7882f1baf4" FOREIGN KEY (language_id) REFERENCES language(id) ON DELETE RESTRICT
    "fk_70e4fa78979b1ad6" FOREIGN KEY (company_id) REFERENCES company(id) ON DELETE SET NULL
    "fk_70e4fa78b167b375" FOREIGN KEY (external_source_id) REFERENCES externalsource(id) ON DELETE RESTRICT
    "fk_70e4fa78bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE RESTRICT
Referenced by:
    TABLE "user_link" CONSTRAINT "fk_4c2dd538a76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
    TABLE "advert" CONSTRAINT "fk_54f1f40ba76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
    TABLE "banner" CONSTRAINT "fk_6f9db8e7a76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE SET NULL
    TABLE "user_admin_module_permission" CONSTRAINT "fk_74fee7cea76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
    TABLE "user_admin_resource_permission" CONSTRAINT "fk_c9fcf279a76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
Has OIDs: no

Network_area_flatdeep table:

10177 rows
                                                           Table "public.network_area_flatdeep"
    Column    |              Type              |                             Modifiers                              | Storage | Stats target | Description 
--------------+--------------------------------+--------------------------------------------------------------------+---------+--------------+-------------
 id           | integer                        | not null default nextval('network_area_flatdeep_id_seq'::regclass) | plain   |              | 
 network_id   | integer                        | not null                                                           | plain   |              | 
 area_id      | integer                        | not null                                                           | plain   |              | 
 created_date | timestamp(0) without time zone | not null                                                           | plain   |              | 
Indexes:
    "network_area_flatdeep_pkey" PRIMARY KEY, btree (id)
    "area_flatdeep_idx_uq" UNIQUE, btree (area_id, network_id, created_date)
    "idx_c29e880034128b91" btree (network_id)
    "idx_c29e8800bd0f409c" btree (area_id)
Foreign-key constraints:
    "fk_c29e880034128b91" FOREIGN KEY (network_id) REFERENCES network(id) ON DELETE CASCADE
    "fk_c29e8800bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
Has OIDs: no

Short server config:

                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

            name            |  current_setting   |        source        
----------------------------+--------------------+----------------------
 shared_buffers             | 1800MB             | configuration file
 work_mem                   | 4MB                | configuration file

Conclusion:

The more data (less filtered out), the faster the query is. I'm a bit clueless now. How can PostgreSQL be so slow on this? Again: 40ms doesn't look extremely slow, but with the real query with a big SELECT statement, queries are 1.5s most of the time and can take up to 3s.

All filtering is done on indexes. The ordering is done on an index too.

Does anyone have an idea how to improve this simple filter I clearly need to separate data?

like image 572
mauserrifle Avatar asked Apr 10 '13 14:04

mauserrifle


People also ask

How to make SELECT query faster in PostgreSQL?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.

What is hash join Postgres?

An implementation of join in which one of the collections of rows to be joined is hashed on the join keys using a separate 'Hash' node. Postgres then iterates over the other collection of rows, for each one looking it up in the hash table to see if there are any rows it should be joined to.

What is filter in PostgreSQL?

The filter clause extends aggregate functions ( sum , avg , count , …) by an additional where clause. The result of the aggregate is built from only the rows that satisfy the additional where clause too.

How does SELECT query work in PostgreSQL?

The PostgreSQL SELECT statement retrieves data from a single or several tables in a database, and returns the data in a result table, called a result-set. Use the SELECT statement to return one or more rows matching the specified criteria from the database tables.


1 Answers

Try to replace the IN(subquery) by the corresponding EXISTS(correlated subquery)

SELECT a0_.id  AS id0
FROM   advert a0_
INNER JOIN member m6_
               ON a0_.user_id = m6_.id
INNER JOIN area a7_
               ON m6_.area_id = a7_.id 
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND EXISTS (
     SELECT*
     FROM network_area_flatdeep xx
     WHERE xx.area_id = m6_.area_id
     AND xx.network_id IN  (2)
     )
ORDER  BY a0_.created_date DESC
LIMIT  60
   ;

An you do need of course primary keys and relavant foreign keys. I don'rt know about network_area_flatdeep.area_id , if it is not a PK or FK, you might need an index on it (or make it part of a composite PK)

UPDATE: a testbed with synthetic data:

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

CREATE Table area
 ( id   SERIAL  not null PRIMARY KEY
 , zzzz varchar
        );

CREATE Table member
 ( id   SERIAL  not null PRIMARY KEY
 , language_id  integer not null DEFAULT 0
 , area_id      integer not null REFERENCES area(id)
 , company_id   integer
 , external_source_id   integer
 , status       integer not null  DEFAULT 0
 , name varchar not null
 , zip_code     varchar not null
 , phone_number varchar default NULL
 , using_email_service  boolean not null DEFAULT False
 , edited_date  timestamp(0) without time zone  default NULL
 , created_date timestamp(0) without time zone  not null
 , hiding_on_own_network        boolean not null DEFAULT False
 , staff        boolean not null DEFAULT False
 , superuser    boolean not null DEFAULT False
 , external_id  varchar default NULL
        );

CREATE TABLE advert
 ( id   SERIAL NOT NULL PRIMARY KEY
 , user_id      integer NOT NULL  REFERENCES member(id)
 , advert_category_id   integer NOT NULL DEFAULT 0
 , currency_id  integer NOT NULL DEFAULT 0
 , advert_kind_id       integer NOT NULL DEFAULT 0
 , advert_price_id      integer
 , external_source_id   integer
 , status       integer NOT NULL DEFAULT 0
 , type integer NOT NULL DEFAULT 0
 , title        varchar NOT NULL
 , description  text NOT NULL
 , price        numeric(10,2) default NULL
 , accepting_bids       boolean NOT NULL DEFAULT False
 , promoted     boolean NOT NULL  DEFAULT False
 , edited_date  timestamp(0) without time zone default NULL
 , created_date timestamp(0) without time zone NOT NULL
 , archived_date        timestamp(0) without time zone  default NULL
 , views        integer NOT NULL  DEFAULT 0
 , checked_date timestamp(0) without time zone default NULL
 , archived_by_cron     boolean NOT NULL  DEFAULT False
 , unarchived_by_cron   boolean NOT NULL  DEFAULT False
 , containting_forbidden_words  boolean NOT NULL  DEFAULT False
 , external_id  varchar default NULL
        );

CREATE INDEX advert_created_date_idx ON advert (created_date);

CREATE Table network_area_flatdeep
        -- the surrogate key in a junction table is questionable
 ( id   SERIAL not null PRIMARY KEY
 , network_id   integer not null -- REFERENCES network(id) ON DELETE CASCADE
 , area_id      integer not null REFERENCES area(id) ON DELETE CASCADE
 , created_date timestamp(0) without time zone not null
        -- the date in the below constraint is questionable
 , CONSTRAINT area_flatdeep_idx_uq UNIQUE (area_id, network_id, created_date)
        );
CREATE INDEX idx_c29e880034128b91 ON network_area_flatdeep(network_id);
CREATE INDEX idx_c29e8800bd0f409c ON network_area_flatdeep(area_id);
INSERT INTO area ( zzzz)
SELECT 'Zzzz_' || gs::text
FROM generate_series(1,39) gs
        ;

INSERT INTO network_area_flatdeep
        -- the surrogate key in a junction table is questionable
 ( network_id , area_id , created_date)
SELECT gs % 7 , aa.id, now()
FROM generate_series(1,76) gs
JOIN area aa ON aa.id = 1+gs % 39
        ;


INSERT INTO member
 ( area_id , name , zip_code, created_date)
SELECT aa.id
        , 'Member_'|| gs::text
        , 'Code_'|| gs::text
        , now()
FROM generate_series(1, 10086) gs
JOIN area aa ON aa.id = 1 + gs % 39
        ;
INSERT INTO advert( user_id , title, description, edited_date , created_date)
SELECT 1+ (gs* 321) % 10086
        , 'Tit_'|| gs::text
        , 'Desc_'|| gs::text
        , now()
        , now() - (random() * 10000 * '1 sec'::interval)
 from generate_series(1,47569 ) gs
        ;

UPDATE member SET status = 1 WHERE random() < .3;
UPDATE advert SET status = 1 WHERE random() < .3;

VACUUM ANALYZE member;
VACUUM ANALYZE advert;
VACUUM ANALYZE area;
VACUUM ANALYZE network_area_flatdeep;

My query executes in 12ms (PG-9.1), with the same plan as the OPs. (so this must be a configuration issue)

like image 77
wildplasser Avatar answered Oct 19 '22 13:10

wildplasser