Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow query optimisation in Postgres

We have a performance issue with a specific SQL query and we're trying to figure out how could we improve here. It's execution time on is about 20 - 100 seconds!

Here is the query and it's explain:

SELECT  "jobs".* FROM "jobs"
  WHERE "jobs"."status" IN (1, 2, 3, 4)
  ORDER BY "jobs"."due_date" ASC
  LIMIT 5;


Limit  (cost=0.42..1844.98 rows=5 width=2642) (actual time=16927.150..18151.643 rows=1 loops=1)
   ->  Index Scan using index_jobs_on_due_date on jobs  (cost=0.42..1278647.41 rows=3466 width=2642) (actual time=16927.148..18151.641 rows=1 loops=1)
         Filter: (status = ANY ('{1,2,3,4}'::integer[]))
         Rows Removed by Filter: 595627
 Planning time: 0.205 ms
 Execution time: 18151.684 ms

We are using PostgreSQL 9.6.11 on AWS RDS.

In a table we have ~500K rows. Fields realted to the query are:

  • due_date(timestamp without time zone, can be null)
  • status(integer, not null)

We have the following indexes:

CREATE INDEX index_jobs_on_due_date ON public.jobs USING btree (due_date)
CREATE INDEX index_jobs_on_due_date_and_status ON public.jobs USING btree (due_date, status)
CREATE INDEX index_jobs_on_status ON public.jobs USING btree (status)
CREATE UNIQUE INDEX jobs_pkey ON public.jobs USING btree (id)

Thank you in advance, - Jack

like image 899
Jack Juiceson Avatar asked Jun 04 '19 10:06

Jack Juiceson


People also ask

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.

Does PostgreSQL optimize queries?

Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible.

What type of optimization algorithm is PostgreSQL using?

Whitley's Genitor algorithm. The GEQO module allows the PostgreSQL query optimizer to support large join queries effectively through non-exhaustive search.


1 Answers

For this query:

SELECT  j.*
FROM "jobs" j
WHERE j."status" IN (1, 2, 3, 4)
ORDER BY "jobs"."due_date" ASC
LIMIT 5;

The "obvious" index is on (status). But that may not help. The goal is to get rid of the sorting. So, you can rewrite the query and use an index jobs(status, due_date):

select j.*
from ((select j.*
       from jobs j
       where j.status = 1
       order by j.due_date asc
       limit 5
      ) union all
      (select j.*
       from jobs j
       where j.status = 2
       order by j.due_date asc
       limit 5
      ) union all
      (select j.*
       from jobs j
       where j.status = 3
       order by j.due_date asc
       limit 5
      ) union all
      (select j.*
       from jobs j
       where j.status = 4
       order by j.due_date asc
       limit 5
      )
     ) j
order by due_date
limit 5;

The subqueries should each use the composite index. The final sort would then be on (at most) 20 rows, which should be fast).

EDIT:

Here is a related idea, with the same index:

SELECT j.*
FROM (SELECT  j.*,
              ROW_NUMBER() OVER (PARTITION BY j.status ORDER BY j.due_date ASC) as seqnum
      FROM "jobs" j
     ) j
WHERE j.status in (1, 2, 3, 4) AND seqnum <= 5
ORDER BY j.due_date ASC
LIMIT 5;

This can use the index for the ROW_NUMBER() calculation. That might require a full table scan of the table. But, the final sort will be limited to 20 rows, so the final sort is eliminated.

like image 63
Gordon Linoff Avatar answered Sep 24 '22 11:09

Gordon Linoff