Can window function LAG reference the column which value is being calculated?

Tags:

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form

SELECT <some fields>, 
  <some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>

This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.

Here is an example. I have a table with events. Each event has type and time_stamp.

create table event (id serial, type integer, time_stamp integer);

I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then

the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on

For this data

insert into event (type, time_stamp) 
 values 
  (1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10), 
  (1,15), (1, 21), (2,13), 
  (1, 40);

and TIMEFRAME=10 result should be

time_stamp | type | duplicate
-----------------------------
        1  |    1 | false
        2  |    1 | true     
        3  |    1 | true 
       10  |    1 | true 
       15  |    1 | false 
       21  |    1 | true
       40  |    1 | false
        2  |    2 | false
       10  |    2 | true
       13  |    2 | false

I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:

WITH evt AS (
  SELECT 
    time_stamp, 
    CASE WHEN 
      time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
    THEN 
      time_stamp
    ELSE
      LAG(current_non_dupl_time_stamp) OVER w
    END AS current_non_dupl_time_stamp
  FROM event
  WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate

But this does not work because the field which is calculated cannot be referenced in LAG:

ERROR:  column "current_non_dupl_time_stamp" does not exist.

So the question: can I rewrite this query to achieve the effect I need?

382

asked Dec 17 '15 16:12

Roman Konoval

2 Answers

Naive recursive chain knitter:

        -- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
        SELECT e.type,e.time_stamp
        , ROW_NUMBER() OVER www as rn                   -- number the records
        , FIRST_VALUE(e.time_stamp) OVER www as fst     -- the "group leader"
        , EXISTS (SELECT * FROM event x
                WHERE x.type = e.type
                AND x.time_stamp < e.time_stamp) AS is_dup
        FROM event e
        WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
        ;

WITH RECURSIVE ttt AS (
        SELECT d0.*
        FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
    UNION ALL
        SELECT d1.type, d1.time_stamp, d1.rn
          , CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
                 ELSE ttt.fst END AS fst   -- new "group leader"
          , CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
                 ELSE True END AS is_dup
        FROM drag d1
        JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
        )
SELECT * FROM ttt
ORDER BY type, time_stamp
        ;

Results:

CREATE TABLE
INSERT 0 10
CREATE VIEW
 type | time_stamp | rn | fst | is_dup 
------+------------+----+-----+--------
    1 |          1 |  1 |   1 | f
    1 |          2 |  2 |   1 | t
    1 |          3 |  3 |   1 | t
    1 |         10 |  4 |   1 | t
    1 |         15 |  5 |   1 | t
    1 |         21 |  6 |   1 | t
    1 |         40 |  7 |  40 | f
    2 |          2 |  1 |   2 | f
    2 |         10 |  2 |   2 | t
    2 |         13 |  3 |   2 | t
(10 rows)

194

answered Oct 13 '22 07:10

wildplasser

An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.

State transition function:

create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
    if st is null or st[1] + timeframe <= time_stamp
    then 
        st[1] := time_stamp;
    end if;
    st[2] := time_stamp;
    return st;
end $$;

Final function:

create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
    select st[1] <> st[2];
$$;

Aggregate:

create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
    sfunc = is_duplicate,
    stype = int[],
    finalfunc = is_duplicate_final
);

Query:

select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;

 id | type | time_stamp | is_duplicate_agg 
----+------+------------+------------------
  1 |    1 |          1 | f
  2 |    1 |          2 | t
  4 |    1 |          3 | t
  5 |    1 |         10 | t
  7 |    1 |         15 | f
  8 |    1 |         21 | t
 10 |    1 |         40 | f
  3 |    2 |          2 | f
  6 |    2 |         10 | t
  9 |    2 |         13 | f
(10 rows)

Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.

answered Oct 13 '22 07:10

klin

Related questions
                            
                                User-created database structure: non-relational or relational databases?
                            
                                Indexing array of strings column type in PostgreSQL
                            
                                What is generally faster, grepping through files or running a SQL LIKE %x% query through blobs?
                            
                                Cluster points in PostGIS
                            
                                cannot install pg gem, checking for PQconnectdb() in -lpq... no
                            
                                Entity-Framework 5.0 with PostgreSQL
                            
                                Many-to-many in sqlalchemy. Preventing SQLAlchemy from inserting into a table if a tag already exists
                            
                                PostgreSQL on cloudfoundry throws PSQLException: FATAL: terminating connection due to administrator command
                            
                                COPY FROM and C3PO connection pool in Postgres
                            
                                PostgreSQL - SQL state: 42601 syntax error
                            
                                Currency/monetary values -- How to store in a DB & transfer using JSON?
                            
                                Complex SQL query with multiple tables and relations
                            
                                Postgresql full text search tokenizer
                            
                                Hibernate + PostgreSQL - Connection with state "Idle in transaction"
                            
                                postgres - cannot drop database using psycopg2
                            
                                PostgreSQL triggers are not called in test environment (while running RSpec)
                            
                                Docker Backup and restore postgres
                            
                                How can ransack use pg_search?
                            
                                How to get PostgreSQL hashes in Phoenix's Ecto models?
                            
                                How do I increase the speed of my Postgres select statement?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can window function LAG reference the column which value is being calculated?

Tags:

postgresql

gaps-and-islands

Roman Konoval

People also ask

2 Answers

wildplasser

klin

Recent Activity

Donate For Us