PostgreSQL window function: partition by comparison

Tags:

I'm trying to find the way of doing a comparison with the current row in the PARTITION BY clause in a WINDOW function in PostgreSQL query.

Imagine I have the short list in the following query of this 5 elements (in the real case, I have thousands or even millions of rows). I am trying to get for each row, the id of the next different element (event column), and the id of the previous different element.

WITH events AS(
  SELECT 1 as id, 12 as event, '2014-03-19 08:00:00'::timestamp as date
  UNION SELECT 2 as id, 12 as event, '2014-03-19 08:30:00'::timestamp as date
  UNION SELECT 3 as id, 13 as event, '2014-03-19 09:00:00'::timestamp as date
  UNION SELECT 4 as id, 13 as event, '2014-03-19 09:30:00'::timestamp as date
  UNION SELECT 5 as id, 12 as event, '2014-03-19 10:00:00'::timestamp as date
)
SELECT lag(id)  over w as previous_different, event
     , lead(id) over w as next_different
FROM events ev
WINDOW w AS (PARTITION BY event!=ev.event ORDER BY date ASC);

I know the comparison event!=ev.event is incorrect but that's the point I want to reach.

The result I get is (the same as if I delete the PARTITION BY clause):

 |12|2
1|12|3
2|13|4
3|13|5
4|12|

And the result I would like to get is:

 |12|3
 |12|3
2|13|5
2|13|5
4|12|

Anyone knows if it is possible and how? Thank you very much!

EDIT: I know I can do it with two JOINs, a ORDER BY and a DISTINCT ON, but in the real case of millions of rows it is very inefficient:

WITH events AS(
  SELECT 1 as id, 12 as event, '2014-03-19 08:00:00'::timestamp as date
  UNION SELECT 2 as id, 12 as event, '2014-03-19 08:30:00'::timestamp as date
  UNION SELECT 3 as id, 13 as event, '2014-03-19 09:00:00'::timestamp as date
  UNION SELECT 4 as id, 13 as event, '2014-03-19 09:30:00'::timestamp as date
  UNION SELECT 5 as id, 12 as event, '2014-03-19 10:00:00'::timestamp as date
)
SELECT DISTINCT ON (e.id, e.date) e1.id, e.event, e2.id
FROM events e
LEFT JOIN events e1 ON (e1.date<=e.date AND e1.id!=e.id AND e1.event!=e.event) 
LEFT JOIN events e2 ON (e2.date>=e.date AND e2.id!=e.id AND e2.event!=e.event) 
ORDER BY e.date ASC, e.id ASC, e1.date DESC, e1.id DESC, e2.date ASC, e2.id ASC

363

asked Mar 19 '14 16:03

Aleix

1 Answers

Using several different window functions and two subqueries, this should work decently fast:

WITH events(id, event, ts) AS (
  VALUES
   (1, 12, '2014-03-19 08:00:00'::timestamp)
  ,(2, 12, '2014-03-19 08:30:00')
  ,(3, 13, '2014-03-19 09:00:00')
  ,(4, 13, '2014-03-19 09:30:00')
  ,(5, 12, '2014-03-19 10:00:00')
   )
SELECT first_value(pre_id)  OVER (PARTITION BY grp ORDER BY ts)      AS pre_id
     , id, ts
     , first_value(post_id) OVER (PARTITION BY grp ORDER BY ts DESC) AS post_id
FROM  (
   SELECT *, count(step) OVER w AS grp
   FROM  (
      SELECT id, ts
           , NULLIF(lag(event) OVER w, event) AS step
           , lag(id)  OVER w AS pre_id
           , lead(id) OVER w AS post_id
      FROM   events
      WINDOW w AS (ORDER BY ts)
      ) sub1
   WINDOW w AS (ORDER BY ts)
   ) sub2
ORDER  BY ts;

Using ts as name for the timestamp column.
Assuming ts to be unique - and indexed (a unique constraint does that automatically).

In a test with a real life table with 50k rows it only needed a single index scan. So, should be decently fast even with big tables. In comparison, your query with join / distinct did not finish after a minute (as expected).
Even an optimized version, dealing with one cross join at a time (the left join with hardly a limiting condition is effectively a limited cross join) did not finish after a minute.

For best performance with a big table, tune your memory settings, in particular for work_mem (for big sort operations). Consider setting it (much) higher for your session temporarily if you can spare the RAM. Read more here and here.

How?

In subquery sub1 look at the event from the previous row and only keep that if it has changed, thus marking the first element of a new group. At the same time, get the id of the previous and the next row (pre_id, post_id).
In subquery sub2, count() only counts non-null values. The resulting grp marks peers in blocks of consecutive same events.
In the final SELECT, take the first pre_id and the last post_id per group for each row to arrive at the desired result.
Actually, this should be even faster in the outer SELECT:
```
 last_value(post_id) OVER (PARTITION BY grp ORDER BY ts
                           RANGE BETWEEN UNBOUNDED PRECEDING
                                 AND     UNBOUNDED FOLLOWING) AS post_id
```
... since the sort order of the window agrees with the window for pre_id, so only a single sort is needed. A quick test seems to confirm it. More about this frame definition.

SQL Fiddle.

160

answered Sep 27 '22 23:09

Erwin Brandstetter

Related questions
                            
                                Sql query where x is a value between two columns
                            
                                Begin Transaction not defined
                            
                                INSTEAD OF DELETE trigger (Postgresql)
                            
                                The database cannot be opened because it is version 706. This server supports version 655 and earlier. A downgrade path is not supported
                            
                                Table Variable In a table Valued function
                            
                                Allowing Excel privileges to SQL server agent [closed]
                            
                                How to retrieve all child elements for all parent elements from xml in sql
                            
                                SQL 2012 - iterate through an XML list (better alternative to a WHILE loop)
                            
                                how to sort by case insensitive alphabetical order using COLLATE NOCASE
                            
                                MYSQL - SELECT IN ( ) ORDER BY IN ( )
                            
                                Cannot generate SSPI context exception after publishing mvc application to web server
                            
                                SQL Server 2012: synchronize two servers with writes to both
                            
                                how to describe and show table in DERBY DB?
                            
                                Case Sensitive column names in Sql Azure Database
                            
                                How to achieve default value if column value is NULL?
                            
                                How to make SQL editor inspect snytax in Intellij IDEA 13?
                            
                                How to write pure SQL in asp.net mvc?
                            
                                with(nolock) , (nolock) , nolock differences?
                            
                                Using a WITH inside of a RECURSIVE WITH in PostgreSQL [duplicate]
                            
                                SQL Distinct Count with Group By

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL window function: partition by comparison

Tags:

sql

postgresql

window-functions

postgresql-performance

Aleix

People also ask

1 Answers

How?

Erwin Brandstetter

Recent Activity

Donate For Us