I would like to SELECT a bunch of rows from table A, along with the results of aggregate functions like avg(A.price) and avg(A.distance).
Now, the SELECT query takes a good bit of time, so I don't want to run one query to get the rows, and other to get the averages. If I did that, I'd be running the query to SELECT the appropriate rows twice.
But looking at the PostgreSQL window function documentation (http://www.postgresql.org/docs/9.1/static/tutorial-window.html), it seems that using window function to return the results of the aggregate functions I want to use alongside the returned rows means that every single row returned would contain the results of the aggregate functions. And in my case, since the aggregation is over all the rows returned by the main SELECT query and not a subset of its rows, this seems wasteful.
What are the performance implications of returning the same avg() many times, given that I'm selecting a subset of the rows in A but doing aggregate queries across the entire subset? In particular, does Postgres recompute the average every time, or does it cache the average somehow?
By way of analogy: If you look at the window function docs and pretend that depname
is 'develop' for every row returned by the SELECT query, and that the average is the same for every row because the average was computed across all returned rows. How many times is that average computed?
Window functions allow to elegantly express many useful query types including time series analysis, ranking, percentiles, moving averages, and cumulative sums. Formulating such queries in plain SQL-92 is usually both cumbersome and in- efficient.
In general, you can optimize window functions by following these rules: In the index, sort on the columns of the PARTITION BY clause first, then on the columns used in the ORDER BY clause. Include any other column referenced in the query as included columns of the index.
Window functions are often more efficient than using a cross join.
A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
According to section 7.2.4 of the doc:
When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data.
You can use a CTE to do what you want. According to the Postgres documentation:
A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as stated, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)
You can structure you final results using a structure such as:
with cte as (your basic select goes here)
select *
from cte cross join
(select averages here
from cte
) const
where < your row filter here>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With