Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)
For example you can do
SELECT name, first_value() over (partition by name order by date) from table1
But is there a way to do something like:
SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1
Where we are applying two functions onto the same window?
Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html
PARTITION BY vs.The GROUP BY clause is used often used in conjunction with an aggregate function such as SUM() and AVG() . The GROUP BY clause reduces the number of rows returned by rolling them up and calculating the sums or averages for each group.
Nested window functions include two functions that you can nest as an argument of a window aggregate function. Those are the nested row number function, and the nested value_of expression at row function.
But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
Window functions operate on a set of rows and return a single value for each row from the underlying query. The OVER() clause differentiates window functions from other analytical and reporting functions.
Can you not just use the window per selection
Something like
SELECT name, first_value() OVER (partition by name order by date) as f, last_value() OVER (partition by name order by date) as l from table1
Also from your reference you can do it like this
SELECT sum(salary) OVER w, avg(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)
Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo
is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)
(see proof at the end of the answer).
Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.
Consider the following query :
SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :
SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
This is a big mistake, as it will take much longer. Proof :
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1) -> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1) Sort Key: bar, foo Sort Method: quicksort Memory: 130006kB -> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1) Total runtime: 2458.969 ms (6 lignes) > EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1) -> WindowAgg (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1) -> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1) Sort Key: bar, foo Sort Method: quicksort Memory: 130006kB -> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1) Total runtime: 3060.041 ms (7 lignes)
Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY
clause on a partition which doesn't need it by itself.
Here comes the WINDOW
syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE
) to the first query :
SELECT array_agg(foo) OVER qux, avg(baz) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY bar)
I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)
" seems questionable, so here is an example :
# SELECT * FROM foobar; foo | bar -----+----- 1 | 1 2 | 2 3 | 1 4 | 2 (4 lines) # SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar); array_agg | avg -----------+----- {1,3} | 2 {1,3} | 2 {2,4} | 3 {2,4} | 3 (4 lines) # SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo); array_agg | avg -----------+----- {1} | 1 {1,3} | 2 {2} | 2 {2,4} | 3 (4 lines)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With