I want to do a special request on my database (PostgreSQL v9.4.5), but I don't manage to do it.
In order to simply, let's say I have the following table AvgTemperatures, representing different averages of temperature taken in different cities, and calculated on different length of time (counted in months) :
id | city | avg | months
----+-----------+------+--------
1 | New-York | 20 | 3 <--- average temperate over the last 3 months
2 | New-York | 19 | 6 <--- average temperate over the last 6 months
3 | New-York | 15 | 12 <--- etc
4 | New-York | 15 | 24
5 | Boston | 13 | 3
6 | Boston | 18 | 8
7 | Boston | 17 | 12
8 | Boston | 16 | 15
9 | Chicago | 12 | 2
10 | Chicago | 14 | 12
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
Now, imagine that I want to select all the rows concerning the measures in a city where at least one average has been over 19 degrees. In this case I want :
id | city | avg | months
----+-----------+------+--------
1 | New-York | 20 | 3
2 | New-York | 19 | 6
3 | New-York | 15 | 12
4 | New-York | 15 | 24
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
I could do something like :
SELECT *
FROM AvgTemperatures
WHERE MIN(avg) OVER (PARTITION BY city) > 16
But :
********** Erreur **********
ERROR: window functions not allowed in WHERE clause
What's more, I cannot use GROUP BY
as in :
SELECT *
FROM AvtTemperatures
GROUP BY city
HAVING MIN(avg) > 16
because I will lose information due to the aggregation (by the way this query is not valid because of the "SELECT *").
I'm pretty sure I can use the OVER PARTITION BY
to solve that, but I don't know how. Does someone have an idea ?
You cannot use window functions in WHERE , GROUP BY , or HAVING .
The WINDOW clause in a query specifies records in a stream partitioned by the time range interval or the number of rows, and an additional optional set of columns specified by the PARTITION BY clause. You can define named or inline window specifications that can be used in analytic functions and streaming JOIN clauses.
Window functions operate on a set of rows and return a single aggregated value for each row. The term Window describes the set of rows in the database on which the function will operate. We define the Window (set of rows on which functions operates) using an OVER() clause.
All-at-once operation:
"All-at-Once Operations" means that all expressions in the same logical query process phase are evaluated logically at the same time.
And great chapter Impact on Window Functions:
Suppose you have:
CREATE TABLE Test ( Id INT) ;
INSERT INTO Test VALUES ( 1001 ), ( 1002 ) ;
SELECT Id
FROM Test
WHERE Id = 1002
AND ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.
Case 1:
If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
Result: 1002
Case 2:
If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )
Result: empty
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!
To get what you want you can wrap windowed function with CTE/subquery
as in Gordon answer:
;WITH cte AS
(
SELECT t.*, MAX(AVG) OVER (PARTITION BY city) AS average
FROM avgTemperatures t
)
SELECT *
FROM cte
where average > 19
ORDER BY id;
db<>fiddle demo
Output:
╔═════╦══════════╦═════╦═════════╗
║ id ║ city ║ avg ║ months ║
╠═════╬══════════╬═════╬═════════╣
║ 1 ║ New-York ║ 20 ║ 3 ║
║ 2 ║ New-York ║ 19 ║ 6 ║
║ 3 ║ New-York ║ 15 ║ 12 ║
║ 4 ║ New-York ║ 15 ║ 24 ║
║ 11 ║ Miami ║ 28 ║ 1 ║
║ 12 ║ Miami ║ 25 ║ 4 ║
║ 13 ║ Miami ║ 21 ║ 12 ║
║ 14 ║ Miami ║ 22 ║ 15 ║
║ 15 ║ Miami ║ 20 ║ 24 ║
╚═════╩══════════╩═════╩═════════╝
The simplest solution is to use the bool_or
aggregate function
select id, city, avg, months
from avttemperatures
where city in (
select city
from avttemperatures
group by 1
having bool_or(avg > 19)
)
order by 2, 4
;
id | city | avg | months
----+----------+-----+--------
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
1 | New-York | 20 | 3
2 | New-York | 19 | 6
3 | New-York | 15 | 12
4 | New-York | 15 | 24
The test table:
create table avttemperatures (
id int, city text, avg int, months int
);
insert into avttemperatures (id, city, avg, months) values
( 1,'New-York',20,3),
( 2,'New-York',19,6),
( 3,'New-York',15,12),
( 4,'New-York',15,24),
( 5,'Boston',13,3),
( 6,'Boston',18,8),
( 7,'Boston',17,12),
( 8,'Boston',16,15),
( 9,'Chicago',12,2),
( 10,'Chicago',14,12),
( 11,'Miami',28,1),
( 12,'Miami',25,4),
( 13,'Miami',21,12),
( 14,'Miami',22,15),
( 15,'Miami',20,24);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With