I have two data.frames each with a time series. My goal ist to use the timeseries of df2 to mark the closest timestamp in df1. Each timestamp in df2 should only mark ONE timestamp in df1!
dates1 <-  as.POSIXct(c("2015-10-26 12:00:00","2015-10-26 13:00:00","2015-10-26 14:00:00"))
values1 <- c("a","b","c")
dates2 <- as.POSIXct(c("2015-10-26 12:05:00","2015-10-26 13:55:00"))
values2 <- c("A","C")
df1 <- data.frame(dates1, values1)
df2 <- data.frame(dates2, values2)
desired outcome:
                dates2 values2 values1
1: 2015-10-26 12:00:00       A       a
2: 2015-10-26 13:00:00       NA      b
3: 2015-10-26 14:00:00       C       c
In order to achieve this, I'm converting the data.frames to data.tables and using the rolling joing "nearest" like so:
dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1,"dates1")
setkey(dt2,"dates2")
dt3 <- dt2[dt1,roll = "nearest"]
                dates2 values2 values1
1: 2015-10-26 12:00:00       A       a
2: 2015-10-26 13:00:00       A       b
3: 2015-10-26 14:00:00       C       c
The values2 "A" is used twice, once for the 12:00 timestamp and once for 13:00. I want each value2 to be used only once, and consulting the data.table manual I would expect to solve this problem with using the option mult = "first".
dt3 <- dt2[dt1,roll = "nearest", mult = "first"]
This results in the same output, "A" is used twice. Where is my mistake?
When running dt2[dt1, roll = "nearest"] you are basically saying "return the rows from dt2 according to the nearest join to each row in dt1 using the key. So 
dates2 in row one in dt2 is the nearest to dates1 in row one in dt1
dates2 in row one in dt2 is the nearest to  dates1 in row two in dt1
dates2 in row two in dt2 is the nearest to  dates1 in row three in dt1
Hence,
dt2[dt1, roll = "nearest"]
#                 dates2 values2 values1
# 1: 2015-10-26 12:00:00       A       a
# 2: 2015-10-26 13:00:00       A       b
# 3: 2015-10-26 14:00:00       C       c
Which are all the rows from dt1 with the joined values2 from dt2.
Instead, we want to join the other way around, namely "extract values2 from dt2 according to the nearest join by each row in dt2 using the key and update the matched rows in dt1", namely
dt1[dt2, roll = "nearest", values2 := i.values2] 
dt1
#                 dates1 values1 values2
# 1: 2015-10-26 12:00:00       a       A
# 2: 2015-10-26 13:00:00       b      NA
# 3: 2015-10-26 14:00:00       c       C
Some additional notes
data.frame and then to data.table, you can just do dt1 <- data.table(dates1, values1) and etc.key parameter data.table, namely dt1 <- data.table(dates1, values1, key = "dates1") and etc.on instead (V 1.9.6+), namely dt1[dt2, roll = "nearest", values2 := i.values2, on = c(dates1 = "dates2")]
<- and data.table(df) use := and setDT(df), see here for more informationIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With