Over time, I have gotten used to the incredibly useful functionality of data.table's rolling joins in r. These utilize the operation of LOCF (last observation carried forward). Unfortunately, I am forced to work in a environment that I am much less familiar with (using postgres). Is there an analogous operation in SQL (specifically postgres)?
Here is an example of what I have and the output I would like:
Here is my first table
dt1 = data.table(Date=seq(from=as.Date("2013-01-03"),
to=as.Date("2013-06-27"), by="1 day"),key="Date")[, ind:=.I]
Date ind
1: 2013-01-03 1
2: 2013-01-04 2
3: 2013-01-05 3
4: 2013-01-06 4
5: 2013-01-07 5
---
172: 2013-06-23 172
173: 2013-06-24 173
174: 2013-06-25 174
175: 2013-06-26 175
176: 2013-06-27 176
Here is my second table
dt2 = data.table(Date=seq(from=as.Date("2013-01-01"),
to=as.Date("2013-06-30"), by="1 week"),key="Date")
Date
1: 2013-01-01
2: 2013-01-08
3: 2013-01-15
4: 2013-01-22
5: 2013-01-29
---
22: 2013-05-28
23: 2013-06-04
24: 2013-06-11
25: 2013-06-18
26: 2013-06-25
Here is the code I would use in data.table for the desired output.
dt1[dt2, roll=Inf]
Date ind
1: 2013-01-01 NA
2: 2013-01-08 6
3: 2013-01-15 13
4: 2013-01-22 20
5: 2013-01-29 27
---
22: 2013-05-28 146
23: 2013-06-04 153
24: 2013-06-11 160
25: 2013-06-18 167
26: 2013-06-25 174
Is this even possible using postgres (or more generally speaking, SQL? Thank you so much for any help you can provide.
I'd really be interested to see if someone can do this without populating the full cross-join table first. But here's a solution with cross join:
http://sqlfiddle.com/#!2/b2f3f/3/0
Create the schema:
CREATE TABLE Table1
(`t1` double, `ind` int)
;
INSERT INTO Table1
(`t1`, `ind`)
VALUES
(1, 1),
(1.9, 2),
(3.1, 3),
(4, 4),
(5.1, 5),
(5.9, 6)
;
CREATE TABLE Table2
(`t2` int)
;
INSERT INTO Table2
(`t2`)
VALUES
(1),
(2),
(3),
(4),
(5),
(6)
;
Query:
select t2, max(ind)
from (select t2, ind
from table1
cross join table2
where t1 <= t2) as foo
group by t2
Result:
T2 MAX(IND)
1 1
2 2
3 2
4 4
5 4
6 6
EDIT: @Hadley's comment is correct, that the full cross join table is never materialized using the query above, as the query above produces the same explain and results as the query below:
select t2, max(ind)
from table1
cross join table2
where t1 <= t2
group by t2
Since I am working in a special Postgres environment, apparently it does not allow me to cross join. Thanks so much to @Clayton Stanley for the great answer, but I had to try another route. It looks to be working so far. I apologize for not being able to perform time comparisons of both methods.
Create the schema
CREATE TABLE Table1
(`id` int,`t1` double, `ind` int)
;
INSERT INTO Table1
(`id`,`t1`, `ind`)
VALUES
(1,0.99, 5),
(1,1.90, 10),
(2,3.10, 12),
(2,4.00, 3),
(3,5.10, 8),
(3,5.90, 16),
(4,5.90, 7),
(4,5.99, 20)
;
CREATE TABLE Table2
(`id` int, `t2` double)
;
INSERT INTO Table2
(`id`,`t2`)
VALUES
(1,1.00),
(2,3.95),
(3,5.05),
(4,6.01)
;
Perform the join using subqueries
select B.*
from Table2 as A
join Table1 as B
on B.id=A.id
join(
select
SUBB.id,
max(SUBB.t1) as t1
from Table2 as SUBA
join Table1 as SUBB
on SUBB.id=SUBA.id and
SUBB.t1 <= SUBA.t2
group by SUBB.id
)
as subqry
on B.t1=subqry.t1 and
A.id=subqry.id
The example schema and output is here:
Link to schema
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With