Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encapsulating Postgres query in view makes it extremely slow

I have a query that runs in about 5 seconds on Postgres 8.4. It selects data from a view joined to some other tables, but also uses the lag() window function, ie.

SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v
JOIN othertables USING (...)
WHERE ...

For convenience I created a new view that simply has

SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v

and then SELECT from that, using all the other JOINs and filters as before. To my surprise this query doesn't complete in 12 minutes (I stopped it at that point). Obviously Postgres has chosen a different execution plan. How do I get it to not do that, ie. use the same plan as in the original query? I would have thought that a view shouldn't change the execution plan, but apparently it does.

Edit: what's more, I found that even if I copy the contents of the first view into the second it still doesn't return.

Edit 2: OK, I've simplified the query sufficiently to post the plans.

Using the view (this doesn't return in any reasonable time):

Subquery Scan sp  (cost=5415201.23..5892463.97 rows=88382 width=370)
  Filter: (((sp.ticker)::text ~~ 'Some Ticker'::text) AND (sp.price_date >= '2010-06-01'::date))
  ->  WindowAgg  (cost=5415201.23..5680347.20 rows=53029193 width=129)
        ->  Sort  (cost=5415201.23..5441715.83 rows=53029193 width=129)
              Sort Key: sp.stock_id, sp.price_date
              ->  Hash Join  (cost=847.87..1465139.61 rows=53029193 width=129)
                    Hash Cond: (sp.stock_id = s.stock_id)
                    ->  Seq Scan on stock_prices sp  (cost=0.00..1079829.20 rows=53029401 width=115)
                    ->  Hash  (cost=744.56..744.56 rows=29519 width=18)
                          ->  Seq Scan on stocks s  (cost=0.00..744.56 rows=29519 width=18)

Taking the window function out of the view and putting into the query itself (this returns instantly):

WindowAgg  (cost=34.91..34.95 rows=7 width=129)
  ->  Sort  (cost=34.91..34.92 rows=7 width=129)
        Sort Key: sp.stock_id, sp.price_date
        ->  Nested Loop  (cost=0.00..34.89 rows=7 width=129)
              ->  Index Scan using stocks_ticker_unique on stocks s  (cost=0.00..4.06 rows=1 width=18)
                    Index Cond: ((ticker)::text = 'Some Ticker'::text)
                    Filter: ((ticker)::text ~~ 'Some Ticker'::text)
              ->  Index Scan using stock_prices_id_date_idx on stock_prices sp  (cost=0.00..30.79 rows=14 width=115)
                    Index Cond: ((sp.stock_id = s.stock_id) AND (sp.price_date >= '2010-06-01'::date))

So it seems that in the slow case it's trying to apply the window function to all the data first and then filter it, which is probably the issue. I don't know why it's doing that, though.

like image 387
EMP Avatar asked Aug 19 '10 23:08

EMP


1 Answers

Your difference between the two plans comes from joining with an aggregate. This prevents the use a nested loop plan. When you use the aggregate in your view, you put yourself in that unfavorable scenario.

This, for instance, will almost always lead to a merge or hash join plan on the two tables followed by a top-n sort:

select foo.*
from foo
join (select bar.* from bar group by bar.field) as bar on foo.field = bar.field
where ...
order by bar.field
limit 10;
like image 110
Denis de Bernardy Avatar answered Nov 15 '22 05:11

Denis de Bernardy