Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Referencing current row in FILTER clause of window function

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.

Assume I have a table with a bunch of columns, one of which is of date type:

col1 | col2 |     dt
------------------------
  1  |  a   | 2015-07-01
  2  |  b   | 2015-07-03
  3  |  c   | 2015-07-10
  4  |  d   | 2015-07-11
  5  |  e   | 2015-07-11
  6  |  f   | 2015-07-13
...

A window definition for processing on the date over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)

I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:

col1 | col2 |     dt     | count
--------------------------------
  1  |  a   | 2015-07-01 |   1
  2  |  b   | 2015-07-03 |   2
  3  |  c   | 2015-07-10 |   1
  4  |  d   | 2015-07-11 |   3
  5  |  e   | 2015-07-11 |   3
  6  |  f   | 2015-07-13 |   4
...

The FILTER clause of the window functions seems like the obvious choice:

count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win

But how do I specify current_row.dt (for lack of a better syntax)? Is this even possible?

If this is not possible, are there other ways of selecting date ranges in a window frame? The frame specification is no help as it is all row-based.

I am not interested in alternative solutions using sub-queries, it has to be based on window processing.

like image 295
Patrick Avatar asked Jul 14 '15 02:07

Patrick


People also ask

Can you use window function in having clause?

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY , HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions.

Which clause identifies a window function?

The OVER clause is mandatory for window functions and differentiates window functions from other SQL functions. (Optional) The PARTITION BY clause subdivides the result set into partitions, much like the GROUP BY clause. If a partition clause is present, the function is calculated for the rows in each partition.

Which of the following are the clauses to implement window functions?

Window functions are permitted only in the select list and ORDER BY clause. Query result rows are determined from the FROM clause, after WHERE , GROUP BY , and HAVING processing, and windowing execution occurs before ORDER BY , LIMIT , and SELECT DISTINCT .

What is happening when you omit the ORDER BY clause when doing aggregates with window functions?

If the window ORDER BY clause is omitted, then rows are processed in an unspecified order so that the results of any window function invoked in this way would be unpredictable and therefore meaningless.


1 Answers

You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.

To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

SELECT t.*, ct.ct_last4days
FROM  (
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

SQL Fiddle.

Related:

  • Select finishes where athlete didn't finish first for the past 3 events
  • PostgreSQL: running count of rows for a query 'by minute'
  • PostgreSQL unnest() with element number
like image 189
Erwin Brandstetter Avatar answered Sep 30 '22 10:09

Erwin Brandstetter