Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling joins (LOCF) in Postgres

Over time, I have gotten used to the incredibly useful functionality of data.table's rolling joins in r. These utilize the operation of LOCF (last observation carried forward). Unfortunately, I am forced to work in a environment that I am much less familiar with (using postgres). Is there an analogous operation in SQL (specifically postgres)?

Here is an example of what I have and the output I would like:

Here is my first table

dt1 = data.table(Date=seq(from=as.Date("2013-01-03"),
         to=as.Date("2013-06-27"), by="1 day"),key="Date")[, ind:=.I]

           Date ind
  1: 2013-01-03   1
  2: 2013-01-04   2
  3: 2013-01-05   3
  4: 2013-01-06   4
  5: 2013-01-07   5
 ---               
172: 2013-06-23 172
173: 2013-06-24 173
174: 2013-06-25 174
175: 2013-06-26 175
176: 2013-06-27 176

Here is my second table

dt2 = data.table(Date=seq(from=as.Date("2013-01-01"),
         to=as.Date("2013-06-30"), by="1 week"),key="Date")

          Date
 1: 2013-01-01
 2: 2013-01-08
 3: 2013-01-15
 4: 2013-01-22
 5: 2013-01-29
 ---
22: 2013-05-28
23: 2013-06-04
24: 2013-06-11
25: 2013-06-18
26: 2013-06-25

Here is the code I would use in data.table for the desired output.

dt1[dt2, roll=Inf]

          Date ind
 1: 2013-01-01  NA
 2: 2013-01-08   6
 3: 2013-01-15  13
 4: 2013-01-22  20
 5: 2013-01-29  27
---
22: 2013-05-28 146
23: 2013-06-04 153
24: 2013-06-11 160
25: 2013-06-18 167
26: 2013-06-25 174

Is this even possible using postgres (or more generally speaking, SQL? Thank you so much for any help you can provide.

like image 848
Mike.Gahan Avatar asked May 12 '14 16:05

Mike.Gahan


Video Answer


2 Answers

I'd really be interested to see if someone can do this without populating the full cross-join table first. But here's a solution with cross join:

http://sqlfiddle.com/#!2/b2f3f/3/0

Create the schema:

CREATE TABLE Table1
    (`t1` double, `ind` int)
;

INSERT INTO Table1
    (`t1`, `ind`)
VALUES
    (1, 1),
    (1.9, 2),
    (3.1, 3),
    (4, 4),
    (5.1, 5),
    (5.9, 6)
;

CREATE TABLE Table2
    (`t2` int)
;

INSERT INTO Table2
    (`t2`)
VALUES
    (1),
    (2),
    (3),
    (4),
    (5),
    (6)
;

Query:

select t2, max(ind)
from (select t2, ind
      from table1
      cross join table2
      where t1 <= t2) as foo
group by t2

Result:

T2  MAX(IND)
1   1
2   2
3   2
4   4
5   4
6   6

EDIT: @Hadley's comment is correct, that the full cross join table is never materialized using the query above, as the query above produces the same explain and results as the query below:

select t2, max(ind)
from table1
cross join table2
where t1 <= t2
group by t2
like image 65
Clayton Stanley Avatar answered Sep 29 '22 06:09

Clayton Stanley


Since I am working in a special Postgres environment, apparently it does not allow me to cross join. Thanks so much to @Clayton Stanley for the great answer, but I had to try another route. It looks to be working so far. I apologize for not being able to perform time comparisons of both methods.

Create the schema

CREATE TABLE Table1
    (`id` int,`t1` double, `ind` int)
;

INSERT INTO Table1
    (`id`,`t1`, `ind`)
VALUES
    (1,0.99, 5),
    (1,1.90, 10),
    (2,3.10, 12),
    (2,4.00, 3),
    (3,5.10, 8),
    (3,5.90, 16),
    (4,5.90, 7),
    (4,5.99, 20)
;



CREATE TABLE Table2
    (`id` int, `t2` double)
;

INSERT INTO Table2
    (`id`,`t2`)
VALUES
    (1,1.00),
    (2,3.95),
    (3,5.05),
    (4,6.01)
;

Perform the join using subqueries

select B.*
from Table2 as A
join Table1 as B
on B.id=A.id
join(
    select 
       SUBB.id,
       max(SUBB.t1) as t1
    from Table2 as SUBA
    join Table1 as SUBB
    on SUBB.id=SUBA.id and
    SUBB.t1 <= SUBA.t2
    group by SUBB.id
    )
as subqry
on B.t1=subqry.t1  and
   A.id=subqry.id

The example schema and output is here:

Link to schema

like image 41
Mike.Gahan Avatar answered Sep 29 '22 07:09

Mike.Gahan