Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pairing Sequential Events on Postgresql

We are logging the main flows of actions our users make on our iPad app on a table. Each flow has a start(tagged Started) and an end that is either tagged Cancelled or Finished, and there shouldn't be any overlapping events.

A set of flows Started, Cancelled or Finished for a user looks like this:

user_id             timestamp                   event_text      event_num
[email protected]   2016-10-30 00:08:00.966+00  Flow Started    0
[email protected]   2016-10-30 00:08:15.58+00   Flow Cancelled  2
[email protected]   2016-10-30 00:08:15.581+00  Flow Started    0
[email protected]   2016-10-30 00:34:44.134+00  Flow Finished   1
[email protected]   2016-10-30 00:42:26.102+00  Flow Started    0
[email protected]   2016-10-30 00:42:49.276+00  Flow Cancelled  2
[email protected]   2016-10-30 00:42:49.277+00  Flow Started    0
[email protected]   2016-10-30 00:59:47.337+00  Flow Cancelled  2
[email protected]   2016-10-30 00:59:47.337+00  Flow Started    0
[email protected]   2016-10-30 00:59:47.928+00  Flow Cancelled  2

We want to calculate how long a cancelled and finished flow last on average. For this we need to pair event Started with Canceled or Finished. The following code does that, however can't work around the following data quality issue that we have:

  • When a customer wants to start a new flow(let's call it Flow2) before ending the ongoing flow (Flow1), we shoot a cancelled event as we shoot the started event for the new flow. So Flow1 Cancelled=Flow2 Started. However when we use window functions to order and lead/lag between ordered events that actually belong to different flows get matched. By using this code:

    WITH track_scf AS (SELECT user_id, timestamp, event_text, CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2, 4 desc ) SELECT user_id, CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as end, CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as action FROM track_scf
    

We get this result:

user_id             start                       end                         action
[email protected]   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
[email protected]   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
[email protected]   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
[email protected]   2016-10-30 00:42:49.277+00  NULL                        NULL
[email protected]   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.337+00  2
[email protected]   NULL                        2016-10-30 00:59:47.928+00  2

But we should get this:

user_id             start                       end                         action
[email protected]   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
[email protected]   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
[email protected]   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
[email protected]   2016-10-30 00:42:49.277+00  2016-10-30 00:59:47.337+00  2
[email protected]   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.928+00  2

How do I need to alter the code so that the pairing is correct?

like image 797
mls.z Avatar asked Oct 30 '22 16:10

mls.z


1 Answers

select      user_id       
           ,"start"                       
           ,"end"                         
           ,"action"

from       (select      user_id
                       ,timestamp                 as "start"
                       ,lead (event_num)   over w as "action"
                       ,lead ("timestamp") over w as "end"
                       ,event_num

            from        tracks t

            window      w as (partition by user_id order by "timestamp",event_num desc)
            ) t

where       t.event_num = 0
;
like image 51
David דודו Markovitz Avatar answered Nov 15 '22 05:11

David דודו Markovitz