Why do I have one PostgreSQL server doing a Hash Join and another doing a Nested Loop Semi Join?

Tags:

postgresql

I have two different machines both running psql 8.4.7 with the same data (using pg_restore), but I'm getting different queries from the two. There are other functional differences between the two machines (one is CentOS and the other Ubuntu, and they use different gcc's), but I would have thought that they'd use more or less the same logic.

One machine is using a Hash Join, and it's going super fast (50ms). The other uses a Nested Loop and takes super long (10s). Any hints at how I can get to the bottom of this?

On one:

database_production=> EXPLAIN ANALYZE SELECT tags.*, COUNT(*) AS count FROM "tags" LEFT OUTER JOIN taggings ON tags.id = taggings.tag_id AND taggings.context = 'categories' INNER JOIN vendors ON vendors.id = taggings.taggable_id WHERE (taggings.taggable_type = 'Vendor') AND (taggings.taggable_id IN(SELECT vendors.id FROM "vendors" INNER JOIN "deals" ON "deals"."vendor_id" = "vendors"."id" INNER JOIN "programs" ON "programs"."id" = "deals"."program_id" INNER JOIN "memberships" ON "memberships"."program_id" = "programs"."id" WHERE (memberships.user_id = 1) AND (vendors.id NOT IN (SELECT vendor_id FROM vendor_ignores WHERE user_id = 1)))) GROUP BY tags.id, tags.name HAVING COUNT(*) > 0 ORDER BY count DESC;
                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=164.89..164.89 rows=1 width=520) (actual time=9444.003..9444.009 rows=6 loops=1)
   Sort Key: (count(*))
   Sort Method:  quicksort  Memory: 17kB
   ->  HashAggregate  (cost=164.86..164.88 rows=1 width=520) (actual time=9443.936..9443.942 rows=6 loops=1)
         Filter: (count(*) > 0)
         ->  Nested Loop Semi Join  (cost=14.92..164.85 rows=1 width=520) (actual time=67.355..9443.645 rows=94 loops=1)
               Join Filter: (public.vendors.id = deals.vendor_id)
               ->  Nested Loop  (cost=9.35..29.93 rows=1 width=528) (actual time=3.570..154.104 rows=7636 loops=1)
                     ->  Nested Loop  (cost=9.35..21.65 rows=1 width=524) (actual time=3.534..83.165 rows=7636 loops=1)
                           ->  Bitmap Heap Scan on taggings  (cost=9.35..13.37 rows=1 width=8) (actual time=3.476..12.277 rows=7636 loops=1)
                                 Recheck Cond: (((taggable_type)::text = 'Vendor'::text) AND ((context)::text = 'categories'::text))
                                 ->  BitmapAnd  (cost=9.35..9.35 rows=1 width=0) (actual time=3.410..3.410 rows=0 loops=1)
                                       ->  Bitmap Index Scan on index_taggings_on_taggable_type  (cost=0.00..4.55 rows=40 width=0) (actual time=1.664..1.664 rows=7636 loops=1)
                                             Index Cond: ((taggable_type)::text = 'Vendor'::text)
                                       ->  Bitmap Index Scan on index_taggings_on_context  (cost=0.00..4.55 rows=40 width=0) (actual time=1.727..1.727 rows=7648 loops=1)
                                             Index Cond: ((context)::text = 'categories'::text)
                           ->  Index Scan using tags_pkey on tags  (cost=0.00..8.27 rows=1 width=520) (actual time=0.004..0.005 rows=1 loops=7636)
                                 Index Cond: (tags.id = taggings.tag_id)
                     ->  Index Scan using vendors_pkey on vendors  (cost=0.00..8.27 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=7636)
                           Index Cond: (public.vendors.id = taggings.taggable_id)
               ->  Nested Loop  (cost=5.57..134.62 rows=24 width=8) (actual time=0.035..1.117 rows=93 loops=7636)
                     ->  Nested Loop  (cost=4.54..100.19 rows=24 width=4) (actual time=0.028..0.344 rows=93 loops=7636)
                           ->  Nested Loop  (cost=0.00..9.57 rows=1 width=8) (actual time=0.010..0.035 rows=3 loops=7636)
                                 ->  Seq Scan on memberships  (cost=0.00..1.29 rows=1 width=4) (actual time=0.004..0.009 rows=3 loops=7636)
                                       Filter: (user_id = 1)
                                 ->  Index Scan using programs_pkey on programs  (cost=0.00..8.27 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=22810)
                                       Index Cond: (programs.id = memberships.program_id)
                           ->  Bitmap Heap Scan on deals  (cost=4.54..90.16 rows=37 width=8) (actual time=0.012..0.042 rows=31 loops=22810)
                                 Recheck Cond: (deals.program_id = programs.id)
                                 ->  Bitmap Index Scan on index_deals_on_program_id  (cost=0.00..4.53 rows=37 width=0) (actual time=0.008..0.008 rows=31 loops=22810)
                                       Index Cond: (deals.program_id = programs.id)
                     ->  Index Scan using vendors_pkey on vendors  (cost=1.03..1.42 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=713413)
                           Index Cond: (public.vendors.id = deals.vendor_id)
                           Filter: (NOT (hashed SubPlan 1))
                           SubPlan 1
                             ->  Seq Scan on vendor_ignores  (cost=0.00..1.02 rows=1 width=4) (actual time=0.017..0.017 rows=0 loops=1)
                                   Filter: (user_id = 1)
 Total runtime: 9444.501 ms
(38 rows)

On the other, the result actually shows up in a separate screen:

    QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=977.70..978.94 rows=496 width=23) (actual time=49.859..49.863 rows=6 loops=1)
   Sort Key: (count(*))
   Sort Method:  quicksort  Memory: 17kB
   ->  HashAggregate  (cost=946.81..955.49 rows=496 width=23) (actual time=49.793..49.801 rows=6 loops=1)
         Filter: (count(*) > 0)
         ->  Hash Join  (cost=440.07..875.99 rows=7082 width=23) (actual time=28.661..49.599 rows=94 loops=1)
               Hash Cond: (taggings.tag_id = tags.id)
               ->  Hash Join  (cost=424.91..763.45 rows=7082 width=4) (actual time=27.388..48.105 rows=94 loops=1)
                     Hash Cond: (taggings.taggable_id = public.vendors.id)
                     ->  Seq Scan on taggings  (cost=0.00..184.72 rows=7378 width=8) (actual time=0.030..12.791 rows=7636 loops=1)
                           Filter: (((context)::text = 'categories'::text) AND ((taggable_type)::text = 'Vendor'::text))
                     ->  Hash  (cost=331.68..331.68 rows=7458 width=12) (actual time=27.226..27.226 rows=94 loops=1)
                           ->  Nested Loop  (cost=134.67..331.68 rows=7458 width=12) (actual time=26.056..27.084 rows=94 loops=1)
                                 ->  HashAggregate  (cost=134.67..134.98 rows=31 width=8) (actual time=26.047..26.153 rows=94 loops=1)
                                       ->  Nested Loop  (cost=1.03..134.59 rows=31 width=8) (actual time=24.422..25.890 rows=94 loops=1)
                                             ->  Nested Loop  (cost=0.00..49.95 rows=59 width=4) (actual time=14.902..15.359 rows=94 loops=1)
                                                   ->  Nested Loop  (cost=0.00..9.57 rows=1 width=8) (actual time=0.108..0.143 rows=3 loops=1)
                                                         ->  Seq Scan on memberships  (cost=0.00..1.29 rows=1 width=4) (actual time=0.050..0.057 rows=3 loops=1)
                                                               Filter: (user_id = 1)
                                                         ->  Index Scan using programs_pkey on programs  (cost=0.00..8.27 rows=1 width=4) (actual time=0.020..0.022 rows=1 loops=3)
                                                               Index Cond: (programs.id = memberships.program_id)
                                                   ->  Index Scan using index_deals_on_program_id on deals  (cost=0.00..39.64 rows=59 width=8) (actual time=4.943..5.005 rows=31 loops=3)
                                                         Index Cond: (deals.program_id = programs.id)
                                             ->  Index Scan using vendors_pkey on vendors  (cost=1.03..1.42 rows=1 width=4) (actual time=0.106..0.108 rows=1 loops=94)
                                                   Index Cond: (public.vendors.id = deals.vendor_id)
                                                   Filter: (NOT (hashed SubPlan 1))
                                                   SubPlan 1
                                                     ->  Seq Scan on vendor_ignores  (cost=0.00..1.02 rows=1 width=4) (actual time=0.022..0.022 rows=0 loops=1)
                                                           Filter: (user_id = 1)
                                 ->  Index Scan using vendors_pkey on vendors  (cost=0.00..6.33 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=94)
                                       Index Cond: (public.vendors.id = deals.vendor_id)
               ->  Hash  (cost=8.96..8.96 rows=496 width=23) (actual time=1.257..1.257 rows=496 loops=1)
                     ->  Seq Scan on tags  (cost=0.00..8.96 rows=496 width=23) (actual time=0.051..0.619 rows=496 loops=1)
 Total runtime: 50.357 ms
(34 rows)

843

asked Jun 15 '11 19:06

Steven

1 Answers

The costs noted in the query plan vary a lot, which Postgres calculates based on index statistics. This information is taken into account by the query planner based which action is projected to be most efficient.

If you have different data in each database, then the index statistics will differ, resulting in a different query plan. If the data's the same, you may simply have out of date index statistics for one or both database(s). VACUUM ANALYZE the relevant tables, and try again.

Edit: Apparently VACUUM ANALYZE did the trick in your case. Next steps:

Make sure autovacuum is working
- You're using 8.4 so the default configuration is probably fine. However, with older versions of postgres, enabling and tuning autovaccum was more challenging
After large writes (deleting/inserting large numbers of rows, or importing from a pg_dump), run a VACUUM ANALYZE to update your indexes.

198

answered Apr 01 '23 18:04

Frank Farmer

Related questions
                            
                                Create index on foreign table PostgreSQL
                            
                                Automatic backup of PostgreSQL database in Ubuntu
                            
                                How to delete postgresql databases matching a pattern?
                            
                                RETURN QUERY vs RETURN QUERY EXECUTE
                            
                                Upsert error (On Conflict Do Update) pointing to duplicate constrained values
                            
                                PostgreSQL ERROR: payload string too long
                            
                                Row level security(RLS) performance is significantly slower in postgres.
                            
                                How do I compare a time difference with a number of seconds in Postgres?
                            
                                pg_dump windows command prompt invalid command
                            
                                Prevent DateRangeField overlap in Django model?
                            
                                SQL duplicate rows with multiple left joins
                            
                                Can't use migrations in EF Core: "42P07: relation "AspNetRoles" already exists"
                            
                                Division by zero handled differently
                            
                                pg.rb segmentation fault [Mojave upgrade]
                            
                                Remove sensitive information from environment variables in postgres docker container
                            
                                Composite primary key in Sequelize
                            
                                Android OS and postgreSQL
                            
                                Where is temporary table created?
                            
                                PostgreSQL 'Deferrable Delete' still hits constraint on Delete
                            
                                Escaping quotes inside text when dumping Postgres Sql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With