Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing self joins by window functions

I am working with following sample data;

       dt   |   ship_id     |   audit_id   | action

 2022-01-02 |     1351      |     id1      | destroy
 2022-01-01 |     1351      |     id1      | create
 2021-12-12 |     3457      |     id2      | create
 2021-12-16 |     3457      |     id2      | destroy
 2021-12-28 |     3457      |     id3      | create
 

To give some context, for a given ship_id, and audit_id; an entry has to be created before it is destroyed as defined by action column. For example, for ship_id=3457, and audit_id=id2; got created on Dec 12 and destroyed on Dec 16.

The goal is to get, for every dt(when action is created), how many audit_ids are created before it , and how many audit_ids are destroyed before it.

Sample output:

       dt  |  created_cnt  |  destroyed_cnt

2022-01-01 |     2         |    1

Possible Approach Using self join idea.

 select
  audit_id,
  ship_id,
  max(case when action = 'create' then dt end) as creation_time,
  max(case when action = 'destroy' then dt end) as removal_time

from table 
group by 1,2)

select
 t1.creation_time as creation_date,
 count(t2.audit_id) as created_cnt,
 count(distinct case when  t2.removal_time <  t1.creation_time then t2.audit_id end) as 
    destroyed_cnt

from cte as t1
left join cte as t2 on  t1.creation_time > t2.creation_time
group by 1 
order by 1 desc;

But due to large table, this self-join is slowing things down. Is it possible to use some sort of window functions here to replace joining? Help is appreciated.

like image 256
jay Avatar asked Mar 02 '23 08:03

jay


1 Answers

Check this solution with over(order by dt rows between unbounded preceding and 1 preceding):

with data as (
    select $1 dt, $2 ship, $3 audit, $4 action
    from values('2022-01-02', 1, 'id1', 'destroy')
    , ('2022-01-01', 1, 'id1', 'create')
    , ('2021-12-12', 2, 'id2', 'create')
    , ('2020-12-16', 2, 'id2', 'destroy')
    , ('2020-12-28', 2, 'id3', 'create')
)

select dt
    , sum(iff(action='create',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) created_cnt
    , sum(iff(action='destroy',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data

enter image description here

like image 186
Felipe Hoffa Avatar answered Apr 29 '23 09:04

Felipe Hoffa