Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Window function to fill timeline gaps

Tags:

sql

postgresql

Data

session                             time_interval       activity
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:40 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:41 (null)
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:42 (null)
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:43 (null)
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:44 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:45 (null)
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:46 running
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:47 (null)
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:48 (null)
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:49 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:50 walking
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:51 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:52 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:53 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:54 running
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:55 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:56 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:57 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:58 resting
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:59 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:17:00 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:17:01 (null)
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:17:02 walking

SQL

SELECT session, 
       time_interval, 
       activity, 
       FIRST_VALUE(activity) 
         OVER ( 
           PARTITION BY session 
           ORDER BY time_interval 
               RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS activity_b
FROM   my_table; 

But this only takes the first value of the session. How can I get the preceding value for each second?

Desired result

session                             time_interval       activity
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:40 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:41 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:42 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:43 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:44 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:45 walking
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:46 running
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:47 running
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:48 running
c889ddb532e76c961c2944dd90b10142    2017-05-25 20:16:49 running
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:50 walking
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:51 walking
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:52 walking
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:53 walking
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:54 running
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:55 running
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:56 running
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:57 running
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:58 resting
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:16:59 resting
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:17:00 resting
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:17:01 resting
dddjg894hlog8sdlf2090288fmma201c    2017-05-25 20:17:02 walking

SQL Fiddle is at capacity, so here's some DDL

CREATE TABLE public.my_table (
  session varchar(32),
  time_interval timestamp,
  activity varchar(10));

INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:40','walking');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:41','');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:42','');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:43','');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:44','walking');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:45','');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:46','running');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:47','');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:48','');
INSERT INTO public.my_table VALUES ('c889ddb532e76c961c2944dd90b10142','2017-05-25 20:16:49','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:50','walking');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:51','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:52','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:53','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:54','running');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:55','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:56','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:57','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:58','resting');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:16:59','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:17:00','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:17:01','');
INSERT INTO public.my_table VALUES ('dddjg894hlog8sdlf2090288fmma201c','2017-05-25 20:17:02','resting');
like image 661
Kermit Avatar asked May 26 '17 03:05

Kermit


2 Answers

This is really where you want the ignore nulls option. But that is not available. So, one way uses a max scan and a join:

select t.session, t.time_interval, tt.activity
from (select t.*,
              max(case when t.activity is not null then t.time_interval end) over (partition by t.session order by t.time_interval) as value_ti
      from t
     ) t left join
     t tt
     on t.value_ti = tt.time_interval and t.session = tt.session;

This calculates the most recent time interval for each row when the value was not NULL. It then joins back to get the activity at that time.

If you know that there are never more than 3 NULLs in a row, you can also use lag():

select t.session, t.time_interval,
       coalesce(t.activity,
                lag(t.activity, 1) over (partition by t.session order by t.time_interval),
                lag(t.activity, 2) over (partition by t.session order by t.time_interval),
                lag(t.activity, 3) over (partition by t.session order by t.time_interval)
               ) as acctivity
from t;
like image 180
Gordon Linoff Avatar answered Nov 16 '22 18:11

Gordon Linoff


note: in the example data activity is null for missing values, however in the provided insert statements activity is '' for missing values. I'm using '' for the example below. it should be trivial to alter this example to work with nulls instead

You can create "sub sessions" within where the activity is the same. The way to transform your data set to create the sub sessions is to increment a dummy variable anytime activity != ''. Once that's performed in an inner query, its possible to use the FIRST_VALUE window function to get the result you want.

SELECT 
  "session"
, time_interval
, FIRST_VALUE(activity) OVER (PARTITION BY "session", sub_session) activity
FROM (
  SELECT 
      *
    , SUM(CASE WHEN activity != '' THEN 1 ELSE 0 END) OVER (PARTITION BY "session" ORDER BY time_interval) sub_session
  FROM my_table
) sub
like image 2
Haleemur Ali Avatar answered Nov 16 '22 20:11

Haleemur Ali