Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql Query Slows Inexplicably with Addition of WHERE Constraint

Tags:

sql

postgresql

I have the following PostgreSQL query, which contains a few subqueries. The query runs almost instantly until I add the WHERE lb.type = 'Marketing' constraint, which causes it to take about 3 minutes. I find it inexplicable that the addition of such a simple constraint causes such an extreme slowdown, but I'm guessing it must point to a fundamental flaw in my approach.

I'm hoping for assistance on a few fronts:

  1. Is my use of subqueries to select the latest records from specific tables appropriate, or could it cause performance issues?
  2. What should I be looking for in the execution plan when trying to diagnose issues?
  3. How should I go about determining what indexes to create for complex queries such as these?
  4. Why could the additional WHERE constraint be causing such a massive slowdown?

The table structure is as follows:

CREATE TABLE sales.leads
(
    lead_id integer NOT NULL DEFAULT nextval('sales.leads_lead_id_seq'::regclass),
    batch_id integer,
    expired integer NOT NULL DEFAULT 0,
    closed integer NOT NULL DEFAULT 0,
    merged integer NOT NULL DEFAULT 0,
    CONSTRAINT leads_pkey PRIMARY KEY (lead_id)
)

CREATE TABLE sales.lead_batches
(
    batch_id integer NOT NULL DEFAULT nextval('sales.lead_batches_batch_id_seq'::regclass),
    inserted_datetime timestamp without time zone,
    type character varying(100) COLLATE pg_catalog."default",
    uploaded smallint NOT NULL DEFAULT '0'::smallint,
    CONSTRAINT lead_batches_pkey PRIMARY KEY (batch_id)
)

CREATE TABLE sales.lead_results
(
    lead_result_id integer NOT NULL DEFAULT nextval('sales.lead_results_lead_result_id_seq'::regclass),
    lead_id integer,
    assigned_datetime timestamp without time zone NOT NULL,
    user_id character varying(255) COLLATE pg_catalog."default" NOT NULL,
    resulted_datetime timestamp without time zone,
    result character varying(255) COLLATE pg_catalog."default",
    CONSTRAINT lead_results_pkey PRIMARY KEY (lead_result_id)
)

CREATE TABLE sales.personal_details
(
    lead_id integer,
    title character varying(50) COLLATE pg_catalog."default",
    first_name character varying(100) COLLATE pg_catalog."default",
    surname character varying(255) COLLATE pg_catalog."default",
    email_address character varying(100) COLLATE pg_catalog."default",
    updated_date date NOT NULL
)

CREATE TABLE sales.users
(
    user_id character varying(50) COLLATE pg_catalog."default" NOT NULL,
    surname character varying(255) COLLATE pg_catalog."default",
    name character varying(255) COLLATE pg_catalog."default"
)

Query:

SELECT l.*, pd.*, lr.resulted_datetime, u.name
    FROM sales.leads l
    INNER JOIN sales.lead_batches lb ON l.batch_id = lb.batch_id
    LEFT JOIN (
        SELECT pd_sub.*
        FROM sales.personal_details pd_sub
        INNER JOIN (
            SELECT lead_id, MAX(updated_date) AS updated_date
            FROM sales.personal_details
            GROUP BY lead_id
        ) sub ON pd_sub.lead_id = sub.lead_id AND pd_sub.updated_date = sub.updated_date
    ) pd ON l.lead_id = pd.lead_id
    LEFT JOIN (
        SELECT lr_sub.*
        FROM sales.lead_results lr_sub
        INNER JOIN (
            SELECT lead_id, MAX(resulted_datetime) AS resulted_datetime
            FROM sales.lead_results
            GROUP BY lead_id) sub
        ON lr_sub.lead_id = sub.lead_id AND lr_sub.resulted_datetime = sub.resulted_datetime
    ) lr ON l.lead_id = lr.lead_id
    LEFT JOIN sales.users u ON u.user_id = lr.user_id
    WHERE lb.type = 'Marketing'
    AND lb.uploaded = 1
    AND l.merged = 0

Execution plan:

Nested Loop Left Join  (cost=10485.51..17604.18 rows=34 width=158) (actual time=717.862..168709.593 rows=18001 loops=1)
  Join Filter: (l.lead_id = pd_sub.lead_id)
  Rows Removed by Join Filter: 687818215
  ->  Nested Loop Left Join  (cost=6487.82..12478.42 rows=34 width=135) (actual time=658.141..64951.950 rows=18001 loops=1)
        Join Filter: (l.lead_id = lr_sub.lead_id)
        Rows Removed by Join Filter: 435482960
        ->  Hash Join  (cost=131.01..1816.10 rows=34 width=60) (actual time=1.948..126.067 rows=17998 loops=1)
              Hash Cond: (l.batch_id = lb.batch_id)
              ->  Seq Scan on leads l  (cost=0.00..1273.62 rows=32597 width=44) (actual time=0.032..69.763 rows=32621 loops=1)
                    Filter: (merged = 0)
                    Rows Removed by Filter: 5595
              ->  Hash  (cost=130.96..130.96 rows=4 width=20) (actual time=1.894..1.894 rows=4 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on lead_batches lb  (cost=0.00..130.96 rows=4 width=20) (actual time=1.078..1.884 rows=4 loops=1)
                          Filter: (((type)::text = 'Marketing'::text) AND (uploaded = 1))
                          Rows Removed by Filter: 3866
        ->  Materialize  (cost=6356.81..10661.81 rows=1 width=79) (actual time=0.006..1.362 rows=24197 loops=17998)
              ->  Nested Loop Left Join  (cost=6356.81..10661.81 rows=1 width=79) (actual time=96.246..633.701 rows=24197 loops=1)
                    Join Filter: ((u.user_id)::text = (lr_sub.user_id)::text)
                    Rows Removed by Join Filter: 1742184
                    ->  Gather  (cost=6356.81..10659.19 rows=1 width=72) (actual time=96.203..202.086 rows=24197 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Hash Join  (cost=5356.81..9659.09 rows=1 width=72) (actual time=134.595..166.341 rows=8066 loops=3)
                                Hash Cond: ((lr_sub.lead_id = lead_results.lead_id) AND (lr_sub.resulted_datetime = (max(lead_results.resulted_datetime))))
                                ->  Parallel Seq Scan on lead_results lr_sub  (cost=0.00..3622.05 rows=44605 width=72) (actual time=0.033..17.333 rows=35684 loops=3)
                                ->  Hash  (cost=5110.36..5110.36 rows=16430 width=12) (actual time=134.260..134.260 rows=24194 loops=3)
                                      Buckets: 32768  Batches: 1  Memory Usage: 1391kB
                                      ->  HashAggregate  (cost=4781.76..4946.06 rows=16430 width=12) (actual time=122.823..129.022 rows=24204 loops=3)
                                            Group Key: lead_results.lead_id
                                            ->  Seq Scan on lead_results  (cost=0.00..4246.51 rows=107051 width=12) (actual time=0.020..71.768 rows=107051 loops=3)
                    ->  Seq Scan on users u  (cost=0.00..1.72 rows=72 width=23) (actual time=0.002..0.007 rows=73 loops=24197)
  ->  Materialize  (cost=3997.68..5030.85 rows=187 width=31) (actual time=0.003..2.033 rows=38211 loops=18001)
        ->  Hash Join  (cost=3997.68..5029.92 rows=187 width=31) (actual time=52.802..85.774 rows=38211 loops=1)
              Hash Cond: ((personal_details.lead_id = pd_sub.lead_id) AND ((max(personal_details.updated_date)) = pd_sub.updated_date))
              ->  HashAggregate  (cost=1811.38..2186.06 rows=37468 width=8) (actual time=23.330..35.345 rows=38212 loops=1)
                    Group Key: personal_details.lead_id
                    ->  Seq Scan on personal_details  (cost=0.00..1623.92 rows=37492 width=8) (actual time=0.014..4.636 rows=38232 loops=1)
              ->  Hash  (cost=1623.92..1623.92 rows=37492 width=35) (actual time=29.058..29.058 rows=38211 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 2809kB
                    ->  Seq Scan on personal_details pd_sub  (cost=0.00..1623.92 rows=37492 width=35) (actual time=0.026..17.231 rows=38232 loops=1)
Planning time: 1.966 ms
Execution time: 168731.769 ms

I have an index on lead_id on all tables, and an additional index on (type, uploaded) in lead_batches.

Thanks very much in advance for any assistance!

EDIT:

The execution plan without the additional WHERE constraint:

Hash Left Join  (cost=15861.46..17780.37 rows=30972 width=158) (actual time=765.076..844.512 rows=32053 loops=1)
  Hash Cond: (l.lead_id = pd_sub.lead_id)
  ->  Hash Left Join  (cost=10829.21..12630.45 rows=30972 width=135) (actual time=667.460..724.297 rows=32053 loops=1)
        Hash Cond: (l.lead_id = lr_sub.lead_id)
        ->  Hash Join  (cost=167.39..1852.48 rows=30972 width=60) (actual time=2.579..36.683 rows=32050 loops=1)
              Hash Cond: (l.batch_id = lb.batch_id)
              ->  Seq Scan on leads l  (cost=0.00..1273.62 rows=32597 width=44) (actual time=0.034..22.166 rows=32623 loops=1)
                    Filter: (merged = 0)
                    Rows Removed by Filter: 5595
              ->  Hash  (cost=121.40..121.40 rows=3679 width=20) (actual time=2.503..2.503 rows=3679 loops=1)
                    Buckets: 4096  Batches: 1  Memory Usage: 234kB
                    ->  Seq Scan on lead_batches lb  (cost=0.00..121.40 rows=3679 width=20) (actual time=0.011..1.809 rows=3679 loops=1)
                          Filter: (uploaded = 1)
                          Rows Removed by Filter: 193
        ->  Hash  (cost=10661.81..10661.81 rows=1 width=79) (actual time=664.855..664.855 rows=24197 loops=1)
              Buckets: 32768 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 2821kB
              ->  Nested Loop Left Join  (cost=6356.81..10661.81 rows=1 width=79) (actual time=142.634..647.146 rows=24197 loops=1)
                    Join Filter: ((u.user_id)::text = (lr_sub.user_id)::text)
                    Rows Removed by Join Filter: 1742184
                    ->  Gather  (cost=6356.81..10659.19 rows=1 width=72) (actual time=142.590..241.913 rows=24197 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Hash Join  (cost=5356.81..9659.09 rows=1 width=72) (actual time=141.250..171.403 rows=8066 loops=3)
                                Hash Cond: ((lr_sub.lead_id = lead_results.lead_id) AND (lr_sub.resulted_datetime = (max(lead_results.resulted_datetime))))
                                ->  Parallel Seq Scan on lead_results lr_sub  (cost=0.00..3622.05 rows=44605 width=72) (actual time=0.027..15.322 rows=35684 loops=3)
                                ->  Hash  (cost=5110.36..5110.36 rows=16430 width=12) (actual time=140.917..140.917 rows=24194 loops=3)
                                      Buckets: 32768  Batches: 1  Memory Usage: 1391kB
                                      ->  HashAggregate  (cost=4781.76..4946.06 rows=16430 width=12) (actual time=127.911..135.076 rows=24204 loops=3)
                                            Group Key: lead_results.lead_id
                                            ->  Seq Scan on lead_results  (cost=0.00..4246.51 rows=107051 width=12) (actual time=0.020..74.626 rows=107051 loops=3)
                    ->  Seq Scan on users u  (cost=0.00..1.72 rows=72 width=23) (actual time=0.002..0.006 rows=73 loops=24197)
  ->  Hash  (cost=5029.92..5029.92 rows=187 width=31) (actual time=97.561..97.561 rows=38213 loops=1)
        Buckets: 65536 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 2660kB
        ->  Hash Join  (cost=3997.68..5029.92 rows=187 width=31) (actual time=52.712..85.099 rows=38213 loops=1)
              Hash Cond: ((personal_details.lead_id = pd_sub.lead_id) AND ((max(personal_details.updated_date)) = pd_sub.updated_date))
              ->  HashAggregate  (cost=1811.38..2186.06 rows=37468 width=8) (actual time=23.831..35.015 rows=38214 loops=1)
                    Group Key: personal_details.lead_id
                    ->  Seq Scan on personal_details  (cost=0.00..1623.92 rows=37492 width=8) (actual time=0.012..4.995 rows=38234 loops=1)
              ->  Hash  (cost=1623.92..1623.92 rows=37492 width=35) (actual time=28.468..28.468 rows=38213 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 2809kB
                    ->  Seq Scan on personal_details pd_sub  (cost=0.00..1623.92 rows=37492 width=35) (actual time=0.024..17.089 rows=38234 loops=1)
Planning time: 2.058 ms
Execution time: 849.460 ms

The execution plan with nested_loops disabled:

Hash Left Join  (cost=13088.17..17390.71 rows=34 width=158) (actual time=277.646..343.924 rows=18001 loops=1)
  Hash Cond: (l.lead_id = pd_sub.lead_id)
  ->  Hash Right Join  (cost=8055.91..12358.31 rows=34 width=135) (actual time=181.614..238.365 rows=18001 loops=1)
        Hash Cond: (lr_sub.lead_id = l.lead_id)
        ->  Hash Left Join  (cost=6359.43..10661.82 rows=1 width=79) (actual time=156.498..201.533 rows=24197 loops=1)
              Hash Cond: ((lr_sub.user_id)::text = (u.user_id)::text)
              ->  Gather  (cost=6356.81..10659.19 rows=1 width=72) (actual time=156.415..190.934 rows=24197 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Hash Join  (cost=5356.81..9659.09 rows=1 width=72) (actual time=143.387..178.653 rows=8066 loops=3)
                          Hash Cond: ((lr_sub.lead_id = lead_results.lead_id) AND (lr_sub.resulted_datetime = (max(lead_results.resulted_datetime))))
                          ->  Parallel Seq Scan on lead_results lr_sub  (cost=0.00..3622.05 rows=44605 width=72) (actual time=0.036..22.404 rows=35684 loops=3)
                          ->  Hash  (cost=5110.36..5110.36 rows=16430 width=12) (actual time=143.052..143.052 rows=24194 loops=3)
                                Buckets: 32768  Batches: 1  Memory Usage: 1391kB
                                ->  HashAggregate  (cost=4781.76..4946.06 rows=16430 width=12) (actual time=131.793..137.760 rows=24204 loops=3)
                                      Group Key: lead_results.lead_id
                                      ->  Seq Scan on lead_results  (cost=0.00..4246.51 rows=107051 width=12) (actual time=0.023..78.918 rows=107051 loops=3)
              ->  Hash  (cost=1.72..1.72 rows=72 width=23) (actual time=0.061..0.061 rows=73 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 12kB
                    ->  Seq Scan on users u  (cost=0.00..1.72 rows=72 width=23) (actual time=0.031..0.039 rows=73 loops=1)
        ->  Hash  (cost=1696.05..1696.05 rows=34 width=60) (actual time=25.068..25.068 rows=17998 loops=1)
              Buckets: 32768 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 2084kB
              ->  Hash Join  (cost=10.96..1696.05 rows=34 width=60) (actual time=0.208..18.630 rows=17998 loops=1)
                    Hash Cond: (l.batch_id = lb.batch_id)
                    ->  Seq Scan on leads l  (cost=0.00..1273.62 rows=32597 width=44) (actual time=0.043..13.065 rows=32623 loops=1)
                          Filter: (merged = 0)
                          Rows Removed by Filter: 5595
                    ->  Hash  (cost=10.91..10.91 rows=4 width=20) (actual time=0.137..0.137 rows=4 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Index Scan using lead_batches_type_idx on lead_batches lb  (cost=0.28..10.91 rows=4 width=20) (actual time=0.091..0.129 rows=4 loops=1)
                                Index Cond: ((type)::text = 'Marketing'::text)
                                Filter: (uploaded = 1)
  ->  Hash  (cost=5029.92..5029.92 rows=187 width=31) (actual time=96.005..96.005 rows=38213 loops=1)
        Buckets: 65536 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 2660kB
        ->  Hash Join  (cost=3997.68..5029.92 rows=187 width=31) (actual time=52.166..84.592 rows=38213 loops=1)
              Hash Cond: ((personal_details.lead_id = pd_sub.lead_id) AND ((max(personal_details.updated_date)) = pd_sub.updated_date))
              ->  HashAggregate  (cost=1811.38..2186.06 rows=37468 width=8) (actual time=23.785..34.403 rows=38214 loops=1)
                    Group Key: personal_details.lead_id
                    ->  Seq Scan on personal_details  (cost=0.00..1623.92 rows=37492 width=8) (actual time=0.013..4.680 rows=38234 loops=1)
              ->  Hash  (cost=1623.92..1623.92 rows=37492 width=35) (actual time=27.960..27.960 rows=38213 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 2809kB
                    ->  Seq Scan on personal_details pd_sub  (cost=0.00..1623.92 rows=37492 width=35) (actual time=0.019..15.350 rows=38234 loops=1)
Planning time: 2.469 ms
Execution time: 346.590 ms
like image 338
JimRollins Avatar asked Nov 06 '22 18:11

JimRollins


1 Answers

You are basically missing some important indexes here.

For testing improvements I've set up the tables myself and tried to fill them with test data with similar distribution as read from the explain plans.

My baseline performance was ~160 seconds: https://explain.depesz.com/s/WlKO

The first thing I did was creating indexes for the foreign key references (although not all will be necessary):

CREATE INDEX idx_personal_details_leads ON sales.personal_details (lead_id);
CREATE INDEX idx_leads_batches ON sales.leads (batch_id);
CREATE INDEX idx_lead_results_users ON sales.lead_results (user_id);

That brought us down to ~112 seconds: https://explain.depesz.com/s/aRcf

Now, most of the time get's actually spend on the self-joins (table personal_details using latest updated_date and table lead_results using latest resulted_datetime). Based on this, I came up with the following two indexes:

CREATE INDEX idx_personal_details_updated ON sales.personal_details (lead_id, updated_date DESC);
CREATE INDEX idx_lead_results_resulted ON sales.lead_results (lead_id, resulted_datetime DESC);

...which then immediately brings us down to ~110 milliseconds: https://explain.depesz.com/s/dDfk

Debugging help

What has helped me in debugging which indexes where most effective, I first rewrote the query to eliminate any sub-select and instead use a dedicated CTE for each of them:

WITH                                             
    leads_update_latest AS (
        SELECT lead_id, MAX(updated_date) AS updated_date
        FROM sales.personal_details
        GROUP BY lead_id
    ),                                    
    pd AS (
        SELECT pd_sub.*                                      
        FROM sales.personal_details pd_sub
        INNER JOIN leads_update_latest sub ON (pd_sub.lead_id = sub.lead_id AND pd_sub.updated_date = sub.updated_date)
    ),                                                                                  
    leads_result_latest AS (
        SELECT lead_id, MAX(resulted_datetime) AS resulted_datetime
        FROM sales.lead_results
        GROUP BY lead_id
    ),              
    lr AS (                                                            
        SELECT lr_sub.*
        FROM sales.lead_results lr_sub
        INNER JOIN leads_result_latest sub ON (lr_sub.lead_id = sub.lead_id AND lr_sub.resulted_datetime = sub.resulted_datetime)
    ),                            
    leads AS (                                       
        SELECT l.*
        FROM sales.leads l
        INNER JOIN sales.lead_batches lb ON (l.batch_id = lb.batch_id)
        WHERE lb.type = 'Marketing'
        AND lb.uploaded = 1
        AND l.merged = 0
    )
SELECT l.*, pd.*, lr.resulted_datetime, u.name
FROM leads l
LEFT JOIN pd ON l.lead_id = pd.lead_id
LEFT JOIN lr ON l.lead_id = lr.lead_id
LEFT JOIN sales.users u ON u.user_id = lr.user_id
;

Surprisingly, by rewriting the query alone into CTE's, the PostgreSQL planner was way faster and took just ~2.3 seconds without any of the indexes: https://explain.depesz.com/s/lqzq

...with optimization:

  1. FK-indexes down to ~230 milliseconds: https://explain.depesz.com/s/a6wT

However, with the other combined indexes, the CTE version degraded:

  1. combined reverse indexes up to ~270 milliseconds: https://explain.depesz.com/s/TNNm

However, as these combined indexes speed up the original query a lot, they also grow a lot faster than single-column indexes and they are an additional write cost to account for in regards to the DB scalability.

As a result, it might make sense to go for a CTE-version performing a bit slower but fast enough to be able omit two additional indexes that the DB has to maintain.

like image 105
Ancoron Avatar answered Nov 14 '22 21:11

Ancoron