SQL window functions: Performance impact of returning the same avg() many times?

Tags:

postgresql

I would like to SELECT a bunch of rows from table A, along with the results of aggregate functions like avg(A.price) and avg(A.distance).

Now, the SELECT query takes a good bit of time, so I don't want to run one query to get the rows, and other to get the averages. If I did that, I'd be running the query to SELECT the appropriate rows twice.

But looking at the PostgreSQL window function documentation (http://www.postgresql.org/docs/9.1/static/tutorial-window.html), it seems that using window function to return the results of the aggregate functions I want to use alongside the returned rows means that every single row returned would contain the results of the aggregate functions. And in my case, since the aggregation is over all the rows returned by the main SELECT query and not a subset of its rows, this seems wasteful.

What are the performance implications of returning the same avg() many times, given that I'm selecting a subset of the rows in A but doing aggregate queries across the entire subset? In particular, does Postgres recompute the average every time, or does it cache the average somehow?

By way of analogy: If you look at the window function docs and pretend that depname is 'develop' for every row returned by the SELECT query, and that the average is the same for every row because the average was computed across all returned rows. How many times is that average computed?

792

asked May 02 '13 16:05

skyw

2 Answers

According to section 7.2.4 of the doc:

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data.

157

answered Sep 28 '22 10:09

John Velonis

You can use a CTE to do what you want. According to the Postgres documentation:

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as stated, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

You can structure you final results using a structure such as:

with cte as (your basic select goes here)
select *
from cte cross join
     (select averages here
      from cte
     ) const
where < your row filter here>

answered Sep 28 '22 08:09

Gordon Linoff

Related questions
                            
                                SQL Query for a particular scenario
                            
                                Design of the 'model' for QTableView in PySide + SQLAlchemy
                            
                                Recurring Billing Database Design [closed]
                            
                                How to speed up a slow UPDATE query
                            
                                HQL/SQL select top 10 records based on count
                            
                                Execution time of consecutive executeUpdate() SQL statements
                            
                                With Doctrine what are the benefits of using DQL over SQL?
                            
                                Converting a doubly-nested query to a JOIN statement, and other optimizations
                            
                                Conditional Where clauses in JasperReports
                            
                                PLSQL Procudure (Oracle) Comparing a variable in where clause
                            
                                SQL LIKE operator with parameters and wildcards
                            
                                Simple way to run SQL queries in IBM Data Studio
                            
                                Retrieving the value of selectCount in jooq
                            
                                Can i have multiple select but only return one result set
                            
                                sqlalchemy how to using AND in OR operation?
                            
                                how to perform an inner or outer join of DataFrames with Pandas on non-simplistic criterion
                            
                                SQL Server 2008, searching for special characters
                            
                                SQL join left get MAX(date)
                            
                                SQL use column name alias without SELECTING
                            
                                REGEXP_LIKE conversion in SQL Server T-SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With