In PostgreSQL 9.4 the window functions have the new option of a FILTER
to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.
Assume I have a table with a bunch of columns, one of which is of date
type:
col1 | col2 | dt ------------------------ 1 | a | 2015-07-01 2 | b | 2015-07-03 3 | c | 2015-07-10 4 | d | 2015-07-11 5 | e | 2015-07-11 6 | f | 2015-07-13 ...
A window definition for processing on the date
over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)
I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:
col1 | col2 | dt | count -------------------------------- 1 | a | 2015-07-01 | 1 2 | b | 2015-07-03 | 2 3 | c | 2015-07-10 | 1 4 | d | 2015-07-11 | 3 5 | e | 2015-07-11 | 3 6 | f | 2015-07-13 | 4 ...
The FILTER
clause of the window functions seems like the obvious choice:
count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win
But how do I specify current_row.dt
(for lack of a better syntax)? Is this even possible?
If this is not possible, are there other ways of selecting date
ranges in a window frame? The frame specification is no help as it is all row-based.
I am not interested in alternative solutions using sub-queries, it has to be based on window processing.
Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY , HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions.
The OVER clause is mandatory for window functions and differentiates window functions from other SQL functions. (Optional) The PARTITION BY clause subdivides the result set into partitions, much like the GROUP BY clause. If a partition clause is present, the function is calculated for the rows in each partition.
Window functions are permitted only in the select list and ORDER BY clause. Query result rows are determined from the FROM clause, after WHERE , GROUP BY , and HAVING processing, and windowing execution occurs before ORDER BY , LIMIT , and SELECT DISTINCT .
If the window ORDER BY clause is omitted, then rows are processed in an unspecified order so that the results of any window function invoked in this way would be unpredictable and therefore meaningless.
You are not actually aggregating rows, so the new aggregate FILTER
clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS
clause.
To make that work, aggregate counts per day and LEFT JOIN
to a full set of days in range. Then you can apply a window function:
SELECT t.*, ct.ct_last4days
FROM (
SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
FROM (
SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
FROM tbl t1
) d
LEFT JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
) ct
JOIN tbl t USING (dt);
Omitting ORDER BY dt
in the widow frame definition usually works, since the order is carried over from generate_series()
in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY
and it might break in more complex queries.
SQL Fiddle.
Related:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With