Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve PostgreSQL query performance

When running this query in my server it's very slow, and I can't understand why. Can anyone help me figure it out?
Query:

SELECT
    "t_dat"."t_year" AS "c0",
    "t_dat"."t_month" AS "c1",
    "t_dat"."t_week" AS "c2",
    "t_dat"."t_day" AS "c3",
    "t_purs"."p_id" AS "c4",
    sum("t_purs"."days") AS "m0",
    sum("t_purs"."timecreated") AS "m1"
FROM "t_dat", "t_purs"
WHERE "t_purs"."created" = "t_dat"."t_key"
  AND "t_dat"."t_year" = 2013
  AND "t_dat"."t_month" = 3
  AND "t_dat"."t_week" = 9
  AND "t_dat"."t_day" IN (1,2)
  AND "t_purs"."p_id" IN (
      '4','15','18','19','20','29',
      '31','35','46','56','72','78')
GROUP BY
    "t_dat"."t_year",
    "t_dat"."t_month",
    "t_dat"."t_week",
    "t_dat"."t_day",
    "t_purs"."p_id"

Explain Analyze:

HashAggregate  (cost=12252.04..12252.04 rows=1 width=28) (actualtime=10212.374..10212.384 rows=10 loops=1)
  ->  Nested Loop  (cost=0.00..12252.03 rows=1 width=28) (actual time=3016.006..10212.249 rows=14 loops=1)
        Join Filter: (t_dat.t_key = t_purs.created)
        ->  Seq Scan on t_dat  (cost=0.00..129.90 rows=1 width=20) (actual time=0.745..2.040 rows=48 loops=1)
              Filter: ((t_day = ANY ('{1,2}'::integer[])) AND (t_year = 2013) AND (t_month = 3) AND (t_week = 9))
        ->  Seq Scan on t_purs  (cost=0.00..12087.49 rows=9900 width=16) (actual time=0.018..201.630 rows=14014 loops=48)
              Filter: (p_id = ANY ('{4,15,18,19,20,29,31,35,46,56,72,78}'::integer[]))
Total runtime: 10212.470 ms
like image 308
Eli_Rozen Avatar asked Mar 03 '13 19:03

Eli_Rozen


3 Answers

It is difficult to say what exactly you are missing, but if I were you, I would make sure that following index exists:

CREATE INDEX t_dat_id_date_idx
    ON t_dat (t_key, t_year, t_month, t_week, t_day);

For t_purs, create this index:

CREATE INDEX t_purs_created_p_id_idx
    ON t_purs (created, p_id);
like image 56
mvp Avatar answered Sep 18 '22 06:09

mvp


Consider using a single column in your table:

t_date date

instead of (t_year, t_month, t_week, t_day). The data type date occupies 4 byte. That would shrink your table a bit, make the index smaller and faster and grouping a lot easier.

Year, month, week and day can easily and quickly be extracted from a date with extract(). Your query could then look like this and would be faster:

SELECT extract (year  FROM t_date) AS c0
      ,extract (month FROM t_date) AS c1
      ,extract (week  FROM t_date) AS c2
      ,extract (day   FROM t_date) AS c3
      ,p.p_id                      AS c4
      ,sum(p.days)                 AS m0
      ,sum(p.timecreated)          AS m1
FROM   t_dat  d
JOIN   t_purs p ON p.created = d.t_key
WHERE  d.t_date IN ('2013-03-01'::date, '2013-03-02'::date)
AND    p.p_id IN (4,15,18,19,20,29,31,35,46,56,72,78)
GROUP  BY d.t_date, p.p_id;

More important for performance is the index, which would then simply be:

CREATE INDEX t_dat_date_idx ON t_dat (t_key, t_date);

Or, depending on data distribution:

CREATE INDEX t_dat_date_idx ON t_dat (t_date, t_key);

The sequence of column matters. You may even create both.

like image 23
Erwin Brandstetter Avatar answered Sep 21 '22 06:09

Erwin Brandstetter


enter image description here

Your query is having sequential scans on t_purs and t_dat. Creating the appropriate indexes will help you make this query faster and avoid sequential scans.

create index index_name on t_purs(created) where created is not null;
create index index_name on t_dat using btree(t_key, t_year, t_month, t_week, t_day)

Run explain analyze after running the above two queries. You'll see the plan time and execution time will be reduced.

like image 37
Pratyush Srivastava Avatar answered Sep 22 '22 06:09

Pratyush Srivastava