Over time, I have gotten used to the incredibly useful functionality of data.table
's rolling joins in r
. These utilize the operation of LOCF
(last observation carried forward). Unfortunately, I am forced to work in a environment that I am much less familiar with (using postgres
). Is there an analogous operation in SQL (specifically postgres
)?
Here is an example of what I have and the output I would like:
Here is my first table
dt1 = data.table(Date=seq(from=as.Date("2013-01-03"),
to=as.Date("2013-06-27"), by="1 day"),key="Date")[, ind:=.I]
Date ind
1: 2013-01-03 1
2: 2013-01-04 2
3: 2013-01-05 3
4: 2013-01-06 4
5: 2013-01-07 5
---
172: 2013-06-23 172
173: 2013-06-24 173
174: 2013-06-25 174
175: 2013-06-26 175
176: 2013-06-27 176
Here is my second table
dt2 = data.table(Date=seq(from=as.Date("2013-01-01"),
to=as.Date("2013-06-30"), by="1 week"),key="Date")
Date
1: 2013-01-01
2: 2013-01-08
3: 2013-01-15
4: 2013-01-22
5: 2013-01-29
---
22: 2013-05-28
23: 2013-06-04
24: 2013-06-11
25: 2013-06-18
26: 2013-06-25
Here is the code I would use in data.table
for the desired output.
dt1[dt2, roll=Inf]
Date ind
1: 2013-01-01 NA
2: 2013-01-08 6
3: 2013-01-15 13
4: 2013-01-22 20
5: 2013-01-29 27
---
22: 2013-05-28 146
23: 2013-06-04 153
24: 2013-06-11 160
25: 2013-06-18 167
26: 2013-06-25 174
Is this even possible using postgres
(or more generally speaking, SQL
? Thank you so much for any help you can provide.
I'd really be interested to see if someone can do this without populating the full cross-join table first. But here's a solution with cross join:
http://sqlfiddle.com/#!2/b2f3f/3/0
Create the schema:
CREATE TABLE Table1
(`t1` double, `ind` int)
;
INSERT INTO Table1
(`t1`, `ind`)
VALUES
(1, 1),
(1.9, 2),
(3.1, 3),
(4, 4),
(5.1, 5),
(5.9, 6)
;
CREATE TABLE Table2
(`t2` int)
;
INSERT INTO Table2
(`t2`)
VALUES
(1),
(2),
(3),
(4),
(5),
(6)
;
Query:
select t2, max(ind)
from (select t2, ind
from table1
cross join table2
where t1 <= t2) as foo
group by t2
Result:
T2 MAX(IND)
1 1
2 2
3 2
4 4
5 4
6 6
EDIT: @Hadley's comment is correct, that the full cross join table is never materialized using the query above, as the query above produces the same explain and results as the query below:
select t2, max(ind)
from table1
cross join table2
where t1 <= t2
group by t2
Since I am working in a special Postgres
environment, apparently it does not allow me to cross join. Thanks so much to @Clayton Stanley for the great answer, but I had to try another route. It looks to be working so far. I apologize for not being able to perform time comparisons of both methods.
Create the schema
CREATE TABLE Table1
(`id` int,`t1` double, `ind` int)
;
INSERT INTO Table1
(`id`,`t1`, `ind`)
VALUES
(1,0.99, 5),
(1,1.90, 10),
(2,3.10, 12),
(2,4.00, 3),
(3,5.10, 8),
(3,5.90, 16),
(4,5.90, 7),
(4,5.99, 20)
;
CREATE TABLE Table2
(`id` int, `t2` double)
;
INSERT INTO Table2
(`id`,`t2`)
VALUES
(1,1.00),
(2,3.95),
(3,5.05),
(4,6.01)
;
Perform the join using subqueries
select B.*
from Table2 as A
join Table1 as B
on B.id=A.id
join(
select
SUBB.id,
max(SUBB.t1) as t1
from Table2 as SUBA
join Table1 as SUBB
on SUBB.id=SUBA.id and
SUBB.t1 <= SUBA.t2
group by SUBB.id
)
as subqry
on B.t1=subqry.t1 and
A.id=subqry.id
The example schema and output is here:
Link to schema
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With